Overview
Reports may be generated throughout the lifespan of your job to provide you with accurate information on the data you have uploaded, the performance of your crowd, and the results they submitted. The Reports page can be accessed by clicking on the Results in the top navigation. For this article, please note that the term "unit" refers to row and "gold" refers to Test Questions.
Results
Fig. 1: Reports Page
Accessing Reports
After your job has begun, the Results Tab will display several options, each with the up-to-date, real time collections of results the job has collected so far. You are able to download any of these reports by simply clicking the Download button.
Options
Click on the 'Options' tab to the right of the 'Results' tab to customize the data included in the reports.
Fig. 2: Options Page
Aggregated Results Settings
Fig. 3: Aggregation Options for Each Result Column
The aggregation settings allow you to select the aggregation method for each question in your job. Each question will result in a column generated by the job with the set aggregation:
- Best Answer ('agg') - Returns the highest confidence response
- All Answers ('all') - Returns every single response inputted
- Numeric Average ('avg') - Returns a numeric average calculated based on all responses
- Top # Answers ('agg_x') - Returns the top 'x' responses
- Bounding Box Aggregation ('bagg_x') - For bounding box image annotation jobs only, returns box responses that overlap above 'x' Intersection over Union (number between 0 and 1)
- Confidence Aggregation ('cagg_x') - Returns the answers that are above 'x' confidence (number between 0 and 1)
- Text Annotation Aggregation ('tagg') For text annotation jobs only, returns a link to a json that describes the text, tokens, spans, and each labeled span will get an inter-annotator agreement score titled 'confidence'.
Here is a list of each CML question with their default aggregation methods:
- cml:text – all answers (aggregation="all")
- cml:textarea - all answers (aggregation="all")
- cml:checkbox - best answer (aggregation="agg")
- cml:checkboxes - best answer (aggregation="agg")
- cml:radios - best answer (aggregation="agg")
- cml:select - best answer (aggregation="agg")
- cml:ratings - numeric average (aggregation="avg")
- cml:text_anntoation - text annotation aggregation (aggregation="tagg")
Because these aggregations are set by default and are not explicitly written in the raw code:
Is identical to:
If you’re using the image annotation tool (cml:shapes), you’ll see something like this in your reports page, depending on what shapes are included in the job design. For information on aggregation, refer to the following articles:
• Bounding boxes
• Polygons
• Dots
• Lines
Report Types
There are six types of reports that can be generated and downloaded after the job has begun running:
Text Annotation Jobs have a further report type, described below.
Full Report
-
_unit_id: A unique ID number created by the system for each row
-
_created_at: The time the contributor submitted the judgement (UTC time zone)
-
_golden: This will be "true" if this is a test question, otherwise it is "false"
-
_id: A unique ID number generated for this specific judgment
-
_missed: This will be "true" if the row is an incorrect judgment on a test question.
-
_started_at: The time at which the contributor started working on the judgement (UTC time zone)
-
_tainted: This will be "true" if the contributor has been flagged for falling below the required accuracy. This judgment will not be used in the aggregation.
-
_channel: The work channel that the contributor accessed the job through
-
_trust: The contributor's accuracy. Learn more about trust here
-
_worker_id: A unique ID number assigned to the contributor
-
_country: The country the contributor is from
-
_region: A region code for the area the contributor is from
-
_city: The city the contributor is from
-
_ip: The IP address for the contributor
-
{{field}}: There will be a column for each field in the job, with a header equal to the field's name.
-
{{field}}_gold: The correct answer for the test question
Note: The term "unit" refers to row
Aggregated Report
This report includes aggregates all of the responses for each individual row in the job. Each row will aggregate according to the method defined under the Options tab on the Results page. Each response will be paired with a confidence score, or the agreement weighted by trust for all answers given. It contains all of your original source columns, as well as:
-
_unit_id: A unique ID number created by the system for each row
-
_golden: This will be "true" if this is a test question, otherwise it is "false"
-
_unit_state: This will be "finalized" if the row has collected all judgments needed, "new" if it has not been launched yet, "judgable" if it requires more judgments, and "golden" if the row is a test question.
-
_trusted_judgments: The number of non-tainted judgment the row has accumulated
-
_last_judgment_at: The time the latest judgment was received
-
{{field}}: There will be a column for each field in the job, with a header equal to the fields name
-
{{field}}:confidence: If you choose to include confidence values, these columns will be included. They represent the level of agreement between contributors. For more on calculating confidence scores, see here.
- {{field}}:stddev: You will only see this column if your aggregation setting is set to Numeric Average ('avg'). This column represents the standard deviation of your judgments for that field:
Source Report
This report includes the original, unprocessed data that was uploaded to the job before it was run. If you do not select 'Exclude Test Questions,' this report will also include Test Question answers and reasons created for a job.
-
_unit_id: A unique ID number created by the system for each row
-
_created_at: The time the row was first uploaded into the job (UTC time zone)
-
_updated_at: The time the row was most recently changed
Test Questions Report
This report includes only data on the test question rows in the job.
-
_id: A unique ID number created by the system for each test question
-
_pct_missed: Percentage of responses that were incorrect
-
_judgments: Total number of judgments this test question received
-
_hidden: This will be "true" if the test question is disabled
-
_contention: The contentions from contributors, separated by newline
-
_pct_contested: Percentage of contributors who answered test question incorrectly and contested
-
_gold_pool: This will be "quiz" for test questions set only for quiz mode, "work" for test questions set only to work mode, and blank or “both” for test questions set to both. Contact your CSM to enable the UI feature in the platform. You can also contact the Appen Platform Support team for access. To learn more visit this article
-
{{field}}: There will be a column for each field in the job, with a header equal to the field's name
-
{{field}}_gold: The correct answer for the test question
-
{{field}}_gold_reason: The reason/explanation for the correct answer
Note: The term "gold" refers to Test Questions
Contributor Report
This report contains information on the performance of individual contributors in the job.
-
worker_id: A unique ID number assigned to the contributor
-
external_id: A unique ID number assigned to the contributor from external channels
-
judgments_count: Number of judgments submitted by contributor
-
missed_count: Number of test questions answered incorrectly by contributor
-
golds_count: Number of test questions answered by contributor
-
forgiven_count: Number of test questions answered incorrectly by contributor, that were then forgiven
-
channel: The work channel that the contributor accessed the job through
-
country: The country the contributor is from
-
region: A region code for the area the contributor is from
-
city: The city the contributor is from
-
last_ip: The most recent IP address for the contributor
-
flagged_at: The time at which the contributor was flagged
-
rejected at: The time at which the contributor's answers were rejected
-
bonus: The amount the contributor has been bonused
-
flag_reason: The requester inputted reason the contributor was flagged
-
trust_overall: The trust level of the contributor
-
submission_rate: Judgments per Hour by a contributor.
- submission_rate is a rounded average of judgments submitted each hour, excluding hours with zero judgments submitted, calculated from the time of the first submission. For example, if first submission occurs at 16:11, each hour is calculated from this point - 16:11-17:10; 17:11-18:10; etc.
Note: The term "gold" refers to Test Questions
JSON Report
This report contains JSON formatted data for each judgment and row. You will most likely need to use a JSON parser in conjunction with this report.
Download All Annotations Report (Text Annotation Jobs Only)
-
This report is only available in text annotation jobs.
- This report will directly download the JSON files in the job rather than links to them like the full/aggregated reports. If metadata from the full or aggregated report is needed, these reports should be downloaded separately.
-
The report will download a zip file. Once the zip is opened, you will receive a new folder.
-
There is one folder inside with all the aggregated judgments, labeled by unit id.
-
The other folder is the full report judgments, labeled by judgment id.
-
-
This report is more secure than the aggregated report as no URLs will be generated.
-
Note: This report may take awhile to generate and download due to the large nature of all it's data files. However, the download will still be much faster compared to running scripts to scrape the results.