In this article, you will find all of the key terms that are commonly seen and heard when using the ADAP (Appen Data Annotation Platform).
A job is composed of a customizable interface that connects your data to an online workforce. Each job in Appen has data rows, instructions, customizable questions for your use case (written in CML), test questions, and is worked on by contributors. Contributors submit judgments on the rows of data via a worker interface. All jobs in a single account can be found here and will be identified by a unique numeric id.
Typically the jobs that work best on Appen follow
- Jobs are typically sizable tasks that would be unreasonable or inefficient for one person (or even a small team of people) to complete on their own.
- They can be completed from a computer but usually cannot be fully automated, or carried out by a computer.
- They can be organized into consistent, discrete steps that contributors can complete independently.
Some example of ideal jobs can be found here.
A row of data is uploaded to the job from your source data file via API or GUI. Judgments are contributor answers on a particular data row.
Contributors completed groups of rows at time called pages. Each page is a collection of one or more randomly selected rows of data. Each time a contributor clicks the ‘Submit’ button in a job, they are completing a page of work and will be paid for that entire page of work. If your job uses test questions, which is always recommended, each page will contain one test question by default.
Test questions serve the dual purpose of training contributors and monitoring contributor performance. Contributors are given a score that reflects their accuracy on test questions in a given job. If a contributor answers a test question incorrectly during work mode, their accuracy is reduced and they are provided with the correct answer and a reason for the chosen answer.
A judgment is the set of answers submitted by a contributor on a row of data. It is recommended to collect multiple judgments and compare them to one another or aggregate to the top response. For each job, you can specify the number of judgments you would like each row to receive. If you would like five judgments per row, that means five different contributors will need to provide an answer to every row before the job is finished.
A trusted judgment is an answer from a contributor with an accuracy score higher than the minimum accuracy you set on the settings page. All trusted judgments are included in your results.
An untrusted judgment, also known as a ‘tainted judgment’, is an answer from a contributor whose accuracy score has fallen below the minimum accuracy set. Untrusted Judgments are not included in your results unless you specify otherwise. You will not collect any tainted judgments if you run a job without Test Questions.
The number of remaining judgments needed for the job to complete. This number will fluctuate due to contributors occasionally transferring from trusted to untrusted. This is only relevant when a job is running.
Appen has scaled to the world’s largest pool of online contributors by partnering with dozens of websites that maintain large online communities. We call these partners “channels.” Our contributors access Appen jobs via offer walls on channel websites. Examples of Channels include Clicksense, Swagbucks and Neobux.
Contributors are the people that are working on your job and being compensated. Individual contributors are identifiable by a Contributor ID.
Accuracy is the contributor’s score on test questions in a single job. If a contributor’s accuracy falls below a preset threshold, they become “untrusted”, their judgments are tainted, and they are no longer allowed to participate in that job. Contributors who maintain an accuracy above this threshold are considered “trusted.”
Each data row has a state that describes its status. The states available to a row are:
- New – Initial state upon data load. These rows have not collected any judgements yet.
- Judgable – The unit has been ordered and is awaiting judgment collection.
- Judging - The unit is in the process of collecting judgements.
- Judged - Enough judgements have been collected. But may still revert to ‘judgeable’ if the confidence score is not high enough at any point.
- Finalized – The unit has collected enough trusted judgments to satisfy its requirements.
- Golden – Indicates a test question
- Hidden (hidden_gold) – Indicates a test question that has been disabled
- Canceled - The unit has been canceled and is only resumable by an admin user manually changing the state.
Each job, at any given time, will be found in one of these states.
- Unordered - Job has not been launched. It may or may not contain units. But never contains judgments.
- Running - Job has been launched and is currently collecting judgements.
- Paused - Job has been paused. A paused job will always contain units. But may or may not contain judgments. Job can be resumed at any time.
- Canceled - Job was cancelled. May contain units and judgements but is not currently running.
- Finished - All ordered units have completed collecting judgements.
- Disabled - Unit data and judgments are present. Job can be copied with all rows.
- Archived - Job that has been in the finished state for 90+ days. Can be restored as needed.
- Launching - Job is in the process of being launched. Typically shows in between ‘Not Launched’ and ‘Running’ states. Appears only for a few seconds.
CML (Custom Markup Language)
CML is Appen's own markup language which features a broad collection of specialized questions that support a wide array of use cases. You can read more about CML in the Appen Success Center. Within CML, questions are data inputs that allow contributors to submit work through your job's user interface. They allow you to dictate the type of answer you receive for each question you ask in your job. CML provides a variety of common questions formats (e.g., text inputs, radio buttons, checkboxes, etc). At least one question is required to launch your job.
Once a job is complete, all of the judgments on a row of data will be aggregated with a confidence score. The confidence score describes the level of agreement between multiple contributors (weighted by each contributors’ trust scores), and indicates our “confidence” in the validity of the aggregated answer for each row of data. The aggregate result is chosen based on the response with the greatest confidence.
Before a contributor can enter your job, they must pass quiz mode which is composed entirely of test questions. This ensures only contributors that prove they can complete your job accurately, will be able to enter your job. Contributors that fail quiz mode are not paid and are disqualified from working on the job.
Throughput is the speed at which the contributors complete your job. This is measured as the number of finalized rows of data per hour.