Guide to: Reports Page and Options Page – Appen Success Center

Overview

Reports may be generated throughout the lifespan of your job to provide you with accurate information on the data you have uploaded, the performance of your crowd, and the results they submitted. The Reports page can be accessed by clicking on the Results in the top navigation. For this article, please note that the term "unit" refers to row and "gold" refers to Test Questions.

Results

Screen_Shot_2019-12-27_at_2.08.24_PM.png

Fig. 1: Reports Page

Accessing Reports

After your job has begun, the Results Tab will display several options, each with the up-to-date, real time collections of results the job has collected so far. You are able to download any of these reports by simply clicking the Download button.

Options

Click on the 'Options' tab to the right of the 'Results' tab to customize the data included in the reports.

Screen_Shot_2019-12-27_at_2.08.33_PM.png

Fig. 2: Options Page

Aggregated Results Settings

Fig. 3: Aggregation Options for Each Result Column

The aggregation settings allow you to select the aggregation method for each question in your job. Each question will result in a column generated by the job with the set aggregation:

Best Answer ('agg') - Returns the highest confidence response
All Answers ('all') - Returns every single response inputted
Numeric Average ('avg') - Returns a numeric average calculated based on all responses
Top # Answers ('agg_x') - Returns the top 'x' responses
Bounding Box Aggregation ('bagg_x') - For bounding box image annotation jobs only, returns box responses that overlap above 'x' Intersection over Union (number between 0 and 1)
Confidence Aggregation ('cagg_x') - Returns the answers that are above 'x' confidence (number between 0 and 1)
Text Annotation Aggregation ('tagg') For text annotation jobs only, returns a link to a json that describes the text, tokens, spans, and each labeled span will get an inter-annotator agreement score titled 'confidence'.

Here is a list of each CML question with their default aggregation methods:

cml:text – all answers (aggregation="all")
cml:textarea - all answers (aggregation="all")
cml:checkbox - best answer (aggregation="agg")
cml:checkboxes - best answer (aggregation="agg")
cml:radios - best answer (aggregation="agg")
cml:select - best answer (aggregation="agg")
cml:ratings - numeric average (aggregation="avg")
cml:text_anntoation - text annotation aggregation (aggregation="tagg")

Because these aggregations are set by default and are not explicitly written in the raw code:

Screen_Shot_2018-07-31_at_11.49.56_AM.png

Is identical to:

Screen_Shot_2018-07-31_at_11.50.17_AM.png

If you’re using the image annotation tool (cml:shapes), you’ll see something like this in your reports page, depending on what shapes are included in the job design. For information on aggregation, refer to the following articles:
• Bounding boxes
• Polygons
• Dots
• Lines

Report Types

There are six types of reports that can be generated and downloaded after the job has begun running:

Full Report
Aggregated Report
Source Report
Test Questions Report
Contributor Report
JSON Report

Text Annotation Jobs have a further report type, described below.

Download All Annotations (Text Annotation only)

Full Report

This report lists each individual response from contributors as a separate row, along with other pertinent information such as IP address, Contributor ID, Trust Score and Channel. This report will generate automatically upon completion of the job. It contains all of your original source columns, as well as:

_unit_id: A unique ID number created by the system for each row
_created_at: The time the contributor submitted the judgement (UTC time zone)
_golden: This will be "true" if this is a test question, otherwise it is "false"
_id: A unique ID number generated for this specific judgment
_missed: This will be "true" if the row is an incorrect judgment on a test question.
_started_at: The time at which the contributor started working on the judgement (UTC time zone)
_tainted: This will be "true" if the contributor has been flagged for falling below the required accuracy. This judgment will not be used in the aggregation.
_channel: The work channel that the contributor accessed the job through
_trust: The contributor's accuracy. Learn more about trust here
_worker_id: A unique ID number assigned to the contributor
_country: The country the contributor is from
_region: A region code for the area the contributor is from
_city: The city the contributor is from
_ip: The IP address for the contributor
{{field}}: There will be a column for each field in the job, with a header equal to the field's name.
{{field}}_gold: The correct answer for the test question

Note: The term "unit" refers to row

Aggregated Report

This report includes aggregates all of the responses for each individual row in the job. Each row will aggregate according to the method defined under the Options tab on the Results page. Each response will be paired with a confidence score, or the agreement weighted by trust for all answers given. It contains all of your original source columns, as well as:

_unit_id: A unique ID number created by the system for each row
_golden: This will be "true" if this is a test question, otherwise it is "false"
_unit_state: This will be "finalized" if the row has collected all judgments needed, "new" if it has not been launched yet, "judgable" if it requires more judgments, and "golden" if the row is a test question.
_trusted_judgments: The number of non-tainted judgment the row has accumulated
_last_judgment_at: The time the latest judgment was received
{{field}}: There will be a column for each field in the job, with a header equal to the fields name
{{field}}:confidence: If you choose to include confidence values, these columns will be included. They represent the level of agreement between contributors. For more on calculating confidence scores, see here.
{{field}}:stddev: You will only see this column if your aggregation setting is set to Numeric Average ('avg'). This column represents the standard deviation of your judgments for that field:

Source Report

This report includes the original, unprocessed data that was uploaded to the job before it was run. If you do not select 'Exclude Test Questions,' this report will also include Test Question answers and reasons created for a job.

_unit_id: A unique ID number created by the system for each row
_created_at: The time the row was first uploaded into the job (UTC time zone)
_updated_at: The time the row was most recently changed

Test Questions Report

This report includes only data on the test question rows in the job.

_id: A unique ID number created by the system for each test question
_pct_missed: Percentage of responses that were incorrect
_judgments: Total number of judgments this test question received
_hidden: This will be "true" if the test question is disabled
_contention: The contentions from contributors, separated by newline
_pct_contested: Percentage of contributors who answered test question incorrectly and contested
_gold_pool: This will be "quiz" for test questions set only for quiz mode, "work" for test questions set only to work mode, and blank or “both” for test questions set to both. Contact your CSM to enable the UI feature in the platform. You can also contact the Appen Platform Support team for access. To learn more visit this article
{{field}}: There will be a column for each field in the job, with a header equal to the field's name
{{field}}_gold: The correct answer for the test question
{{field}}_gold_reason: The reason/explanation for the correct answer

Note: The term "gold" refers to Test Questions

Contributor Report

This report contains information on the performance of individual contributors in the job.

worker_id: A unique ID number assigned to the contributor
external_id: A unique ID number assigned to the contributor from external channels
judgments_count: Number of judgments submitted by contributor
missed_count: Number of test questions answered incorrectly by contributor
golds_count: Number of test questions answered by contributor
forgiven_count: Number of test questions answered incorrectly by contributor, that were then forgiven
channel: The work channel that the contributor accessed the job through
country: The country the contributor is from
region: A region code for the area the contributor is from
city: The city the contributor is from
last_ip: The most recent IP address for the contributor
flagged_at: The time at which the contributor was flagged
rejected at: The time at which the contributor's answers were rejected
bonus: The amount the contributor has been bonused
flag_reason: The requester inputted reason the contributor was flagged
trust_overall: The trust level of the contributor
submission_rate: Judgments per Hour by a contributor.
- submission_rate is a rounded average of judgments submitted each hour, excluding hours with zero judgments submitted, calculated from the time of the first submission. For example, if first submission occurs at 16:11, each hour is calculated from this point - 16:11-17:10; 17:11-18:10; etc.

Note: The term "gold" refers to Test Questions

JSON Report

This report contains JSON formatted data for each judgment and row. You will most likely need to use a JSON parser in conjunction with this report.

Download All Annotations Report (Text Annotation Jobs Only)

This report is only available in text annotation jobs.
This report will directly download the JSON files in the job rather than links to them like the full/aggregated reports. If metadata from the full or aggregated report is needed, these reports should be downloaded separately.
The report will download a zip file. Once the zip is opened, you will receive a new folder.
- There is one folder inside with all the aggregated judgments, labeled by unit id.
- The other folder is the full report judgments, labeled by judgment id.
This report is more secure than the aggregated report as no URLs will be generated.
Note: This report may take awhile to generate and download due to the large nature of all it's data files. However, the download will still be much faster compared to running scripts to scrape the results.