Guide to: Quality Audit – Appen Success Center

Introduction
Getting Started
- Unsupported Job
- Generate Aggregations
Setup Audit
- Grid View
Conducting Your Audit
- Topbar
- Sidebar
- Units
Auditing Results
- Saving Changes
Accuracy Score
Comparison Report

-----------------

1. Introduction

Quality Audit allows you to review the results of your jobs directly on the platform. It aims to simplify and accelerate the audit process to ensure you have greater insight into your data during annotation.

-----------------

2. Getting Started

Quality Audit requires that the job have at least one finalized row of data, at which point you can set up your Grid View and start your audit. Click on 'Audit' on the top navigation bar to access the Audit view.

When you get to the Audit page, you’ll see one of these possible screens:

i. Unsupported Job

If the current job does not support Quality Audit, this screen will be shown.

Currently supported tags include:

cml:checkbox
cml:checkboxes
cml:radios
cml:ratings
cml:select
cml:text
cml:textarea
cml:shapes
cml:video_shapes
cml:image_transcription
cml:image_segmentation
cml:text_annotation
cml:text_relationships
cml:audio_annotation
cml:audio_transcription
cml:audio_transcription (including the new judgment format introduced in Q4 2022)
cml:taxonomy_beta
cml:taxonomy_tool

ii. Generate Aggregations

Aggregations generate automatically when a job finishes. In Quality Audit you generate aggregations for a job that has finalized rows but has not yet finished by clicking this button.

Note: This product only supports top answer or shape aggregations (aggregation=”agg” or aggregation="box-agg", "polygon-agg", etc.), and text annotation aggregations (tagg); fields that use other aggregation types cannot be filtered.

-----------------

3. Setup Audit

Screenshot 2023-08-21 at 6.52.03 PM.png

Once your aggregations have been generated, you will have the choice of creating an audit at either the UNITS tab or the CONTRIBUTORS tab. UNITS level audit defaults to the aggregate answer for all judgments and the CONTRIBUTORS level audit defaults to the judgment level. Irrespective of where you setup your audit, you will be able to freely navigate between the different tabs to audit judgments at their respective level. For more information on CONTRIBUTOR level audit see the section below (Judgment Level Quality Audit). After clicking on Setup Audit you will be directed to set up your audit using the Audit Configuration modal.

Sample Units Amount
- Determine the percentage of units that will be available for auditing (e.g. a value of 40% for a 1000-unit job will result in 400 units available)
- Leaving this field blank will result in all finalized units being available for auditing

Editing Sample Size
- After specifying the initial sample size, you can edit the sample size by selecting Configuration and selecting Edit Sample Size as shown below

Screenshot 2023-08-18 at 3.43.12 PM.png

- A new modal will pop up which will allow you to specify the new sample size
- You can also choose not to include the units already audited and the sample will be pulled from a pool of units which have not been audited yet

Screenshot 2023-08-18 at 3.45.20 PM.png

Data Source For Audit Preview
- Up to three columns can be selected from your source data to display, along with the data type. Quality Audit can render the following data types:
  - Text
  - Image
  - Audio
  - Video
  - URL
  - HTML (Click Here to learn more)
- Be sure to select the correct data type for the column. Otherwise, the data won’t render.
- In a Text Relationships question, when selecting the type Text for the source data column, Quality Audit will present the text from the original job:

Question For Audit Preview
- This section allows you to filter for answer values to specific questions. This filters the rows that are returned in Quality Audit. For example, if you filter for 'food' in the modal below, your view will display rows of data where the top answer was 'food' for the question ‘Is this a photo of the restaurant’s menu, food, interior, or exterior?’
- For image annotation or transcription jobs with only one judgment per row, be sure to select 'Include Single [shape type]' in the aggregation settings
- You can filter by ontology class if you are auditing a question with ontology support (with the exception of Text Relationships).
  - This filter uses an or operator; images containing one or more of the selected classes will be returned.
  - For jobs without ontology, you will not be able to filter on this field; instead, after this field is selected, each card will show the count of total annotations in the image.

Radios, Checkbox, Checkboxes, Select, Ratings, and Taxonomy

Text, Textarea, Tools without ontology, and Text Relationships

Tools with ontology enabled: Shapes, Video Shapes, Image Transcription, Image Segmentation, Text Annotation, Audio Annotation, and Audio Transcription

Grid View

Upon configuring the Data Sources, the Grid View will be displayed. The Grid View shows units with their respective source data and answers distribution for the selected question.

The Grid View has three sections:

Topbar: located at the top, it contains options for configuring which units, questions, and data sources to show;
Sidebar: located at the right, contains information about the accuracy and other options regarding Report and Regenerating Aggregations;
Units: the actual queried units, each unit presented as a Card containing the configured Data Sources and the selected Question.

-----------------

4. Conducting Your Audit

You’re ready to start auditing your job! In the Grid View, you’ll notice a few buttons:

Configure Tile
- Customize Data Source
  - This modal is equivalent to the Data Source For Audit Preview section of Setup Audit
- Customize Question
  - This modal is equivalent to the Question For Audit Preview section of Setup Audit

Filters
- Audit Status
  - Audited: this will only display the rows you’ve already audited. See the ‘Auditing Results’ section below.
  - Unaudited: This will only display the rows you’ve not yet audited.
- Question Confidence Score
  - This filter is only enabled for the questions that support aggregation

Filters with confidence score enabled

Filters with confidence score disabled

Sort by

- ID: Descending Order
  - Unit ID in descending order
    
    This is the default sorting.
- ID: Ascending Order
  - Unit ID in ascending order
- Confidence: High to Low (on supported questions)
  - Confidence for the question currently displayed in Grid View (configured via the Customize Question modal). For information on confidence scores, check out our Confidence Score article.
- Confidence: Low to High (on supported questions)
- Randomize
  - Display units in a random order

Sidebar

The sidebar contains information about the total rows audited and the accuracy alongside options for downloading an audit report and regenerating the aggregations.
The accuracy is the ratio between the correct audited questions per total audited questions. It is divided into three different sections, when applicable:
- Filtered Units: Units that match the current filters and are present in Grid View. Present when there are applied filters
- Sampled Units: Units present in the job sample that was defined in the Setup Audit phase. Present when the sample % is lower than 100%
- Total Finalized Units: All finalized units in the job. This section is always present

View Details
- When clicking on View Details in any of the three accuracy sections, the Accuracy modal for the section is opened. It shows the overall accuracy and a detailed accuracy for each question, for the scope: entire job, job sample, or filtered units
- Note: the question accuracy is the ratio between the correct audited answers per total audited answers across all the units for the same question.

Download Audit Report
- Click this button to download the CSV Report for the audited units only.
Regenerate Aggregations
- Click this button to regenerate aggregations, this will allow you to audit any newly finalized units.

Units

The unit card shows the selected source data, unit id, and up to 3 aggregated answers, ontologies, or the number of annotated shapes, depending on the chosen question.

Types with Ontology

Types without ontology

Types with agg support

Text Relationships

Clicking View Details will open the Detail View, which allows you to conduct an in-depth audit of your results.

-----------------

5. Auditing Results

On the Detail View, For each row of data, you can mark each field correct or incorrect by clicking the ‘X’ (incorrect) or the check (correct). If incorrect, you can choose the correct answer and you can choose to provide a reason for the answer.

If auditing an annotation tool question, you will be presented with the tool along with the judgments and aggregation if available:

Select a judgment or the aggregation from the dropdown in the top left to view the annotations.

For cml:image_annotation and cml:text_annotation, after you have marked the field as incorrect, you will be presented with some additional options to choose from:

Image Annotation (Shapes)

Too many annotations
Missing annotations
Incorrect classes
Annotations too loose
Annotations too tight

Text Annotation

Incorrect classes
Incorrect spans
Missing spans
Missing classes
Didn't follow guidelines
Nonsensical Annotations

Choose as many as apply. You can still add a freeform reason to elaborate on any of the above.

Shapes options

Text Annotation Options

The provided reasons are stored in the audit report (see ‘Audit Report’ section below) for you to discern patterns in the results and for general tracking purposes. This process does not change any actual answers provided in the Job aggregated report.

6. Accuracy Score

Once you’ve audited at least one row, you’ll notice the accuracy value in the Sidebar changes. Clicking on View Details opens the Accuracy Modal, which provides a breakdown of your per-field accuracy, along with an overall job accuracy. These are calculated as follows:

Per-field accuracy: the number of correct answers out of total rows audited
Overall: the average of all the fields’ accuracies

In the example above, the first question in this job – whether the photo depicts a restaurant's menu, food, interior, or exterior – was marked correct for all five rows audited, resulting in 100% accuracy. The second question has low accuracy at 1/4 correct or 25%. The third field has pretty high accuracy, with 7 out of 8 correct, or 87.50%. The average of these three fields is 70.83%, which is shown near the top of the modal.

These values help you pinpoint where your job is performing well and where it could be improved, whether in the instructions, test questions, job design, or any other way job accuracy is impacted.

-----------------

7. Comparison Report

In addition to an accuracy score, once you’ve audited at least one row of data, there will also be an audit report available to download. This report contains the following:

The unit ID
- The source data from your job
{question}_aggregated
- The aggregated answer for the field
{question}_confidence
- The confidence score for the field
{question}_correct_yn
- Whether each field was marked correct or incorrect
  - A value of ‘1’ is correct, and ‘0’ is incorrect.
{question}_audit
- The correct answer provided for this field
- If the field were marked correct, this value would match {question}_aggregated; otherwise, it will contain the correct answer you provided during the audit.
  - For fields marked incorrect in image annotation jobs, this column will contain any provided checkbox reasons outlined above.
{question}_audit_reason
- The reason you provided for this answer

Example:

Judgment Level Quality Audit

As an enhancement to our Quality Audit feature, we have introduced the Judgment-Level audit capability in the ADAP platform. This empowers job authors to assess audit results at a finer, judgment-specific level. Currently, the Judgment-Level audit is compatible with the 'agg' aggregation setting and basic cml form elements such as cml:checkboxes, cml:radios, and cml:select. It is important to note that this feature is not yet compatible with any of our annotation tools.

To utilize this feature, if you already have an audit that has been generated, you will need to click on the "Regenerate Aggregations" link located on the right-hand side of the page under the "OTHER OPTIONS" section.

For those generating an audit for the first time for a particular job, you will encounter two tabs at the top of your page: "UNITS" and "CONTRIBUTORS." The first tab, "UNITS," allows you to audit the aggregated result of the given unit. In order to setup your audit for either UNITS or CONTRIBUTORS follow the steps outlined starting at step 3 above for audit setup within your job.

Once your audit is configured, selecting the "CONTRIBUTORS" tab will lead you to the Judgment-Level Quality Assurance (QA) view. In contrast to the grid of units, the "CONTRIBUTORS" tab displays a table presenting judgments made by individual contributors. This table consists of three primary columns: Contributor ID, Overall Accuracy, and Judgments, as illustrated below.

Sorting the judgments table is effortless – simply click on the column headers. Should your job involve numerous contributors, you can specify the number of rows displayed per page using the drop-down option at the bottom left-hand corner (default is 20), with results paginated accordingly.

Additionally, a filter button above the table enables you to further define filtering rules based

audit status
overall accuracy
cml:fields with select answers.

Adjacent to the filter button, you will find a search bar allowing you to search by Contributor ID and Unit ID.

Contributor Judgments

To access more details about a contributor's judgments, click on the respective row to reveal a list of judgments (limited to 3 judgment cards in this view).

For a comprehensive list of judgments made by a specific contributor, click on the "See Full List" link located at the bottom right-hand corner of the expanded contributor view. Clicking this link will present a grid view showcasing all judgments completed by the contributor, equipped with updated filters tailored specifically for this view.

The available filters for the contributor grid view encompass:

audit status
unit accuracy
drop-down sorting option

This streamlined approach enables job authors to efficiently specify the parameters for auditing results on a judgment-level basis.

Auditing Judgment Level Results

Access the judgments via the Detail View link in either the Full List view or the standard table view of the Contributors tab. While in Detail View, the default selection for the drop down (shown below) will be based on the contributor where the Detail View selection was made. If you would like to view judgments made by other contributors, you can simply select the contributor option in the drop down.

After clicking on an individual judgment you will see the contributor selections via the SELECTED text which will be to the right of the answer. Below, “Yes” was selected by the contributor for “Is the article of clothing in the bounding box a dress?”

Auditing judgments works the same as auditing at the unit level. You can mark each field correct or incorrect by clicking the ‘X’ (incorrect) or the check (correct). If incorrect, you can choose the correct answer; no matter what, you’ll be able to provide a reason for the answer.

It is important to know, if you audit a contributor judgment, it will automatically apply the audited answer(s) to all of the judgments for that unit as well as the aggregate.

Alternatively, if you’d prefer to audit the Aggregate result, you can select the aggregate result in the drop down and the audit the aggregate which again will apply the audited answers to the individual contributor judgments made for that unit.

Contributor Report

After auditing at least one row of data, there will be a Contributor Report available to download. This report contains the following:

contributor_id
- The unique ID assigned to the contributor working in your task
judgments:count
- The total number of judgments performed by the contributor
audit:count
- The number of judgments audited by the job author for that contributor
audit:correct
- The number of correct judgments for the contributor
audit:accuracy
- The accuracy after auditing the contributor judgment(s)

Example:

Table of Contents

1. Introduction

2. Getting Started

i. Unsupported Job

ii. Generate Aggregations

3. Setup Audit

Grid View

4. Conducting Your Audit

Topbar

Sidebar

Units

5. Auditing Results

6. Accuracy Score

7. Comparison Report

Judgment Level Quality Audit