API Integration Best Practices

Before integrating with the API, it’s highly recommended to first build and run a job within our graphical user interface. Once you’ve run a job successfully and are satisfied with the results, the job launch process can be automated. Find the full Appen API Request Guide here.

Please visit this link for our new Developer API Documentation.

To run more data through your existing job template, you have the option to either 1) upload more rows to an existing job or 2) copy the job and upload data to the newly created job ID.

Note: Maximum amount of rows by default is 250,000. But, the Team admin has the ability to increase it to 1 million.

When integrating it’ll be important to understand how you’ll want to receive the results, below are the two most common methods:

  • Webhook Integration
    • This route allows rows to be sent automatically to a webhook as soon as they finalize
    • Allows you to get results back in real time
      • Note: A web application will need to be set up to accept POST requests from Appen.
  • Standard API Integration
    • This route allows you to pull finalized rows via an API request in batches

If you would like to copy an existing job:

  • API Command to copy existing job with only test questions:
    • curl -H 'Authorization: Token token={api_key}' -X GET "{job_id}/copy.json?gold=true"
      • Note: the job ID that is being copied from is required
    • The "gold=true" parameter tells the API to copy the job with test questions only. There are variations of this request that can be used to copy a job with all of its rows (source data + test questions) or none of its rows. If you intend to copy without rows and then upload test questions, see this request.
    • The API responds with a JSON structure of the job. Included in the JSON will be an attribute called "id" which is the job ID of the newly created copy. Save this ID and use it in requests moving forward.

1. Upload Data

  • For both integration routes:
    • To upload individual rows, one at a time use this request:
      • curl -H 'Authorization: Token token={api_key}' -d POST "unit[data][{column1}]={some_data_1}" -d "unit[data][{column2}]={some_data_2}"{job_id}/units.json
      • Note: The data for the row is specified by the column header and associated value for each header:value pair in that row. Additionally, the job ID the data is being uploaded to is also required.
    • To batch upload data with a CSV file, use this request:
      • curl -H 'Authorization: Token token={api_key}' -X PUT -T "{sample_data.csv}" -H "Content-Type: text/csv"{job_id}/upload.json
    • Note: Instead of a job ID,  an alias can be specified for the job. Please contact your Customer Success Manager to set this up.

2. Launch

  • Since all of the settings from the template are retained when copying, the job should be ready to launch using this request:
    • curl -H 'Authorization: Token token={api_key}' -X POST -d "channels[0]=on_demand&debit[units_count]={100}"{job_id}/orders.json
      • Note: this command will launch to all external channels. The number of rows being launched is required along with the job ID.
      • In order to specify the number of rows to launch you can use the ‘job status’ call (see Monitor below) to get total rows uploaded to the job.
  • When using a webhook:
    • The job should be set up to automatically launch newly added rows using this request:
      • curl -H 'Authorization: Token token={api_key}' -X PUT --data-urlencode "job[auto_order]=true"{job_id}.json

3. Monitor

  • It is recommended to monitor the job in the UI for highly missed test questions to ensure the job is running smoothly. After the initial monitoring, pinging the job with the following request will show how many rows have finalized and how many are still remaining:
    • curl -H 'Authorization: Token token={api_key}' -X GET{job_id}/ping.json
      • The response will look like the following and when “needed_judgments” = 0, the job is finished:
        • {"golden_units":30,"all_units":20703,"ordered_units":20673,"completed_units_estimate":17887,"needed_judgments":8469,"all_judgments":88984,"tainted_judgments":24825,"completed_gold_estimate":25,"completed_non_gold_estimate":17862}

4. Results

  • For Standard API integration:
    • Download a CSV of any report using this request:
      • curl -H 'Authorization: Token token={api_key}' -o "{filename}.zip" -L "{job_id}.csv?type={report_type}"
      • Note: specify a filename for the output file, the job ID to get the report from, and the type of report needed.
    • Here are the available report types:
      • full - Returns the Full report containing every judgment
      • aggregated - Returns the Aggregated report containing the aggregated response for each row
      • json - Returns the JSON report containing the aggregated response, as well as the individual judgments
    • Once downloaded, the file will need to be unzipped.
  • When using a webhook:
    • When a row finalizes in your job, meaning it has collected the required number of judgments in order to finalize, Appen will send a POST request to the webhook URL containing all of the information associated with that row. Please see this article for more information about Appen Webhook Basics.


Was this article helpful?
8 out of 9 found this helpful

Have more questions? Submit a request
Powered by Zendesk