In this article, you will find all of the key terms that are commonly seen and used when working with Quality Flow (QF) Projects on the Appen Data Annotation Platform (ADAP).
Project Structure
Project
A Quality Flow Project is the top-level container for your data and jobs. All units are uploaded at the project level first, and then routed to individual jobs within the project. The project-level Dataset is the single source of truth for your data, always reflecting the latest results in columns prefixed with "latest...".
Job
A Job is a task within a project where contributors annotate or review data. There are three types of job in Quality Flow:
-
Work Job — The primary annotation job where contributors submit judgments on data units. Only Work Jobs can collect multiple judgments per row.
-
Following QA Job — A quality assurance job that branches directly off a Work Job on the Jobs Canvas. It automatically inherits the Work Job's design (but can be modified) and adds scoring and rejection options and a feedback box, allowing expert reviewers to evaluate contributor work, send feedback and send units back to be reworked.
-
Leading QA Job — A QA job created directly from the project's ALL DATA source. Used for reviewing pre-annotated data or for further examination and revision of completed Work Jobs. The feedback & rework loop is not available for Leading QA Jobs.
Jobs Canvas
The visual interface where you can see and manage the relationships between your jobs. Following QA Jobs appear branching off their Leading Work Jobs on the canvas. Grey circles between jobs are used to configure routing settings such as sample rates and filter criteria.
Job States
Every job in Quality Flow moves through three states. The state controls what is configurable and whether contributors can access the job:
-
Draft — The job is still being created. All settings and design are fully editable. The job is not visible to contributors and units cannot yet be sent to it.
-
Paused — Activated by clicking "Start Data Routing". Some settings are now locked (Judgments per Row and Dynamic Judgments & External contributor cannot be changed after leaving Draft). The job design can still be edited. Units can be sent to the job and will route automatically, but the job is not yet open to contributors.
-
Running — The job is open to contributors. Design can still be modified, but other settings are locked until pausing the job makes them configurable again.
Data & Units
Unit
A single item of data in your project dataset. Units are uploaded to the project level first, and then sent to jobs for contributors to work on. Each unit has a status tracking its progress through the workflow.
Unit Status
Each unit has a status describing where it is in the workflow. The following statuses appear in the project summary bar at the top of the DATASET tab:
-
New — Units that have never been assigned to any job.
-
Judgable — Units that are in a job and ready to be worked on, but no judgments have been collected yet in that job.
-
Working — Units currently being worked on by contributors. In jobs collecting multiple judgments, a unit remains Working until all required judgments are submitted.
-
Submitted — All required judgments for the unit have been collected.
-
Abandoned — Units that have reached the maximum number of allowed abandons and require manual action.
-
Golden — Units that have been designated as test questions in any job. Visible in the dataset table.
The following statuses appear when there is a following QA job:
-
QA Accepted — The unit passed QA review with no modifications made by QA
-
QA Rejected — The unit failed QA review with no modifications made by QA
-
QA Modified | Accepted — The unit's original annotation was edited by a QA contributor but the unit was still evaluated as high enough quality - this will reflect on the original contributors quality score.
-
QA Modified | Rejected — The unit's original annotation was edited by a QA contributor and the unit failed evaluations which will be reflected on the original contributors quality score.
Raw Progress
A project-level metric visible in the dataset summary bar, calculated as:
Raw Progress = Units Raw Submissions ÷ (Total Units − New)
This gives an overall percentage of how much of the dataset has received at least one submission across any job.
Data Group
A tag applied to a subset of units in the project dataset, allowing you to filter, prioritise, and manage specific collections of data more easily. Created by selecting units and choosing Actions > Create Data Group. Data Groups appear as a tab next to All Data in the dataset view.
Unit Prioritisation
A setting that marks selected units as high priority, ensuring they are worked on first in both Work and QA jobs. Priority is set at the project level and applies across all jobs until explicitly removed. Priority needs to be set before routing.
Unit Groups (Segmented Projects)
A feature that allows you to group multiple units together into meaningful subsets — for example, sequential segments of a longer audio or video file. Unit groups ensure that one contributor works on all segments of a group in sequence, preserving context. To use unit groups, your data upload must include a _UNIT_GROUP column. Note that unit groups are only supported in quiz mode but not in quiz + work mode..
Retain Grouping
A job setting for projects with unit groups that ensures all segments of a group are always kept together and assigned to a single contributor as one task. If selected in a Work Job, any Following QA Job will permanently inherit this setting. This cannot be changed after data has started routing.
Abandoned Units
Units that a contributor started but did not complete. You can configure a Max Number of Abandons to control how many times an abandoned unit returns to the contributor pool before it is permanently marked as ABANDONED and requires manual action to release.
Judgments & Agreement
Judgment
A judgment is the complete set of answers submitted by a contributor for a single unit. Quality Flow projects can collect between 1 and 200 judgments per row. Each judgment is associated with a unique judgment ID and a specific contributor.
-
In Unit View, all judgments for a unit are collapsed into one row.
-
In Judgment View, each judgment is visible in its own row with a unique judgment number.
Multiple Judgments
The ability to collect more than one judgment per unit from different contributors, for the purpose of measuring agreement or improving data quality. Once a job has started data routing, the number of judgments to collect cannot be changed. Only Work Jobs support multiple judgments, following QA jobs are not available for Work Jobs collecting multiple judgments.
Dynamic Judgments
An enhancement to multiple judgments that stops collecting judgments as soon as a pre-defined agreement threshold has been reached, saving time and cost. You configure a minimum number of judgments to always collect and a maximum cap. If agreement is reached early, the unit finalises before hitting the maximum. Agreement can be assessed by:
-
Matching — A set number of contributors must select the same answer.
-
Confidence — The weighted confidence in an answer (based on contributor trust scores) must meet a minimum threshold.
Dynamic Judgments can only be configured before "Start Data Routing" is activated, and settings cannot be edited after that point.
Agreement
For form-based questions (radios, ratings, and checkboxes), agreement per row is available in the dataset table and downloads. Agreement indicates how consistently contributors answered the same way on a given unit. Worker agreement (Wawa) can be viewed in the Contributor Dashboard.
Confidence Score
A measure of the level of agreement between multiple contributors on a single unit, weighted by each contributor's trust score. Confidence is used in Dynamic Judgments to determine when a unit has sufficient agreement to be finalised. Trust score is measured by test questions, so confidence should only be used in conjunction with test questions.
Inter-Annotator Agreement (Krippendorff's Alpha)
A reliability score that measures how consistently multiple annotators agree on answers, adjusted for chance agreement. Available for radio or checkbox question. Values range from −1 to 1:
-
< 0 — Systematic disagreement
-
0.0–0.2 — Slight agreement
-
0.21–0.4 — Fair agreement
-
0.41–0.6 — Moderate agreement
-
0.61–0.8 — Substantial agreement
-
0.81–1.0 — Near-perfect agreement
Downloadable via the IAA Report at the job level under Job > Results.
Quality Assurance
Test Questions
Pre-labeled units (also called "gold data") used to assess and monitor contributor quality. In Quality Flow, test questions are always created from existing units in the project dataset — no new units are created. Test questions serve two purposes:
-
Assessing contributors before they begin work (Quiz Mode)
-
Monitoring contributors while they work (Work Mode)
Test questions are supported in projects using Unit Groups, but only in Quiz Only jobs.
Quiz Only vs Quiz + Work
-
Quiz Only — Contributors will only see test questions; no regular work units are included.
-
Quiz + Work — Contributors must pass a quiz first, then proceed to work on regular units, with test questions interspersed at a set frequency.
Minimum Accuracy
The accuracy threshold a contributor must maintain on test questions in order to remain in a job. If a contributor's accuracy falls below this threshold, they are removed from the job. In Quiz Mode, it determines whether a contributor passes or fails the quiz.
Trust Score
A contributor's score based on their performance on test questions within a job. Trust scores are used as a weighting factor when calculating confidence on units in jobs that use multiple judgments. Trust scores are visible in the Contributor Dashboard, and downloadable via the Contributor Daily Report.
QA Sampling
The percentage of a contributor's submitted work that is routed to a Following QA Job for review. Configured by clicking the grey circle between the Work Job and the QA Job on the Jobs Canvas. A minimum number of units per contributor can also be specified. Sampling rates can be adjusted while a QA job is running (by pausing, adjusting, and resuming), but changes only apply to future submissions.
Quality Assurance Classes
Customisable categories used in Following QA Jobs to label the type or severity of feedback left by QA contributors. Each class is assigned a severity score between 0 (no issue) and 1 (very poor quality). Classes interact with the Rejection Threshold to determine whether a unit is Accepted or Rejected, and drive quality scoring metrics. Classes can optionally be hidden from the original contributor.
Rejection Threshold
A numeric value (between 0 and 1) set in the QA job configuration. If the total severity of all QA classes selected for a unit equals or exceeds the threshold, the unit is considered Rejected. If it falls below the threshold, the unit is Accepted.
Rework
An advanced QA option that sends a rejected unit back to the original contributor for revision. Requires Quality Assurance Classes to be configured, as the rejection threshold controls when rework is triggered. A unit can be sent back for rework a maximum of two times; after a third failed attempt, it will not be sent back again.
Job Quality Score
A metric available in the Quality Dashboard when Quality Assurance Classes are used. Calculated as:
Job Quality Score = 1 − (Sum of all QA class severities across all units ÷ Total QA'd units)
A score closer to 100% indicates fewer or less severe errors across the job.
Acceptance Score
A metric in the Quality Dashboard reflecting the proportion of units accepted by QA without any modification or rejection:
Acceptance Score = (Number of accepted units without QA changes ÷ Total judgments) × 100
Data Routing
Unit Routing
The process of moving units from one job to another within a Quality Flow project. Units can be routed manually via Actions > "Send to Job" on the dataset, or automatically via Recursive Sampling. When routing, two key settings control how existing judgments are handled:
-
Carry Over — Whether existing judgments from the source job will be visible to contributors in the destination job.
-
Overwrite — Whether work done in the destination job will replace the original judgments, or be recorded as additional judgments alongside them.
Common use case combinations:
|
|
Overwrite ON |
Overwrite OFF |
|---|---|---|
|
Carry Over ON |
Revise judgments — second job revises and replaces first job's work |
Review & Add — second job can see and revise, but original is kept |
|
Carry Over OFF |
Replace Judgments — second job provides fresh judgments, discarding originals |
Add judgments — collect additional independent judgments for the same units |
Recursive Sampling / Filter Criteria
A routing configuration set on the grey circles leading from the ALL DATA box to a target job on the Jobs Canvas. Allows you to define filtering criteria and a frequency so that only units matching certain conditions are automatically routed into a job. When filter criteria are active, percentage sampling is always 100%. Filter criteria can only be set for one job at a time.
Saved Filters
Commonly used filter configurations on the dataset table that can be saved by name and retrieved for future use. Saved filters in Unit View are only available in Unit View, and likewise for Judgment View.
Contributors
Contributor
Contributors are the people who work on your jobs and are compensated for their submissions. Individual contributors are identifiable by a unique Contributor ID.
Internal Contributor Dashboard
A contributor-facing dashboard available to internal contributors at https://account.appen.com/internal/tasks. Internal contributors can view and access all their assigned tasks, organised by status: New, In Progress, and Completed. Tasks display the task name, overview, classification tags (inherited from the tools used in the job), and current status. Tasks move to Completed when no available work remains, and return to In Progress when new data becomes available.
Dashboards
Quality Flow projects have four dashboards accessible from the DASHBOARDS tab, plus a downloadable reports section. A Question Settings button on the dashboard allows you to include or exclude specific form and text question metrics from all views.
Productivity Dashboard
Tracks the progress of units per job and overall project throughput. Contains two sections:
-
Progress — Displays unit counts per job by status: Not Started, Working, Submitted, and Resolving. Filterable by job type.
-
Throughput — Displays overall work rate (units completed per hour), number of active contributors, and total contributor hours. Filterable by job type, date range, and time slot.
Quality Dashboard
Displays QA metrics for the project, including acceptance rate, rejection reasons (if configured), and question-level accuracy for form tools. For audio transcription jobs, additional metrics such as Word Error Rate (WER) and Tag Error Rate (TER) are available. A Detailed View provides an error breakdown by error type and contributor, downloadable as a report. The dashboard is dynamic and adapts based on the QA settings configured for the job.
Contributor Dashboard
Displays performance metrics per contributor, including trust scores, agreement scores, quality and acceptance scores, and throughput information. Also includes a panel for comparing QA checker behaviour across contributors. This dashboard is searchable.
Feedback Dashboard
Visible when the Feedback loop is enabled. Shows the status of all feedback items and is where the Project Owner goes to arbitrate any disputed feedback. The Project Owner can choose to revert to the original contributor's work or approve the QA contributor's modifications. Unit feedback statuses are:
-
New — The feedback has not yet been reviewed by the original contributor.
-
Resolved — The feedback has been acknowledged and accepted by the original contributor, or the dispute has been resolved by the Project Owner.
-
Disputed — The original contributor has contested the feedback. Requires Project Owner arbitration.
Reports & Downloads
Reports in Quality Flow are available at two levels: project-level (from the DATASET tab and the DASHBOARDS > Reports (Downloadable) tab) and job-level (from Job > Results).
Dataset Download
The project-level export of your data, always reflecting the latest results in latest.judgment.{field} columns. No regeneration is needed. Downloads can be filtered and column-limited before export. For jobs with multiple judgments, downloading in Judgment View ensures the full dataset (including empty rows) is captured consistently. To compare older judgments, parse the JSON data in extra.allCommitsBin from the Dataset Report.
Job Daily Report
A project-level downloadable report providing throughput and quality metrics per job, broken down by day over the life of the project. Key fields include total submitted units, working hours, units approved/modified/rejected in QA, acceptance/accuracy rate, and (where applicable) detailed error metrics such as word and tag error rates. Available from DASHBOARDS > Reports (Downloadable).
Contributor Daily Report
A project-level downloadable report providing the same metrics as the Job Daily Report, but broken down per contributor per day. Includes trust scores and QA performance per contributor. Also includes downloadable contributor quiz scores and test question answers from the CONTRIBUTORS tab or Job > Results. Available from DASHBOARDS > Reports (Downloadable).
JSON Report
A project-level download of the full dataset report (all columns) in JSON format. Available from DASHBOARDS > Reports (Downloadable).
Source Report
A download of your original source data file as uploaded. Available from DASHBOARDS > Reports (Downloadable).
Test Question Report
A job-level report downloadable from the Quality tab that includes comprehensive information about each test question: performance metrics (% missed, % contested, contention), configuration settings, gold answers, and quiz/work mode. This report can also be re-used as an upload template for bulk editing existing test questions or re-using test questions in duplicated jobs.
Full & Aggregated Results
Job-level reports available under Job > Results. Note that units routed between jobs will only appear in the job they are currently in; once a unit leaves a job, its judgment data is only accessible through the project-level Dataset Report. Four aggregation methods are supported:
-
Best answer (agg) — Returns the top 1 response.
-
All answers (all) — Returns all responses.
-
Numeric average (avg) — Returns a numeric average calculated from all results.
-
Top # Answers (agg_x) — Returns the top x responses.
IAA Report (Krippendorff's Alpha)
A job-level report downloadable from Job > Results. Provides a Krippendorff's Alpha reliability score for each radio or checkbox question in the job design, measuring annotator consistency adjusted for chance agreement. See the Inter-Annotator Agreement entry above for score interpretation.
Project Management
Copy Project
Projects can be copied from the Projects list (via the three-dot menu) or from inside the project (via the copy icon). When copying, you can choose to carry over job settings (excluding contributor settings, which should always be re-verified) and select what dataset content to include:
-
Test Questions Only — Copy only the gold test question data.
-
Copy All Units — Copy all units including test questions (existing judgments are not copied).
-
Copy New Units — Copy only units that have not yet been worked on.