Guide to: Running an Audio Transcription Job


The cml:audio_transcription tag allows users to create an audio transcription job with custom labels and tag sets.

Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.


Fig. 1: Audio Transcription tool interface for Contributors

Building a Job


  • The audio transcription tool supports the transcription of .wav, .mp3, and .ogg file types.
  • Your data must be CORS configured.
  • All data access control features are supported for this tool, including Secure Data Access.


As this tool is in open beta, there is no Graphical Editor support yet. For access to the tool's CML gem, please reach out to your Customer Success Manager.


Below are the parameters available for the job design. Some are required in the element, while some are optional.

  • type (required - as of 16 January 2023)
    • transcription- enables the transcription field, as well as the tags and timestamps.
      • timestamps still also require allow-timestamping
    • labeling- enables the labels, as configured in the ontology.
    • segmentation- enables segmentation on the large waveform.
    • none- if no type is configured, only the audio player will be available to contributors.


  • type="['labeling', 'transcription']" Allows labeling and transcription (including tags/timestamps)

  • type="['labeling', 'segmentation']" Allows labeling and segmentation


  • source-data (required)
    • The column header from your source data containing the audio URLs to be annotated.
  • name (required)
    • The results header where the annotations will be stored.
  • segments-data (optional)

    • The column header from your source data containing the audio segmentation data (the start and end timestamps of each segment).

      • The tool uses this data to create the transcription box for each segment.

      • The tool expects the data to be in the format found below.

      • If you do not have segmentation data, omit this parameter.

  • label (optional)
    • The question label that the contributors will see.
  • validates (optional)
    • Defines whether or not the element is required to be answered.
    • Accepts 'required'
    • Defaults to not required if not present.
    • Defaults to ‘required’ if this is the only CML in the job design
  • review-data (optional)
    • This will read in existing transcriptions on an audio file.
    • If used, a source column with links to the transcriptions formatted as is outputted by the audio transcription tool is required (format as seen below in the 'Results' section).
    • This parameter may be used to do a peer review or model validation job.
    • Please see the “Review Mode” section for more details
    • You can use raw text input as yourreview-data (e.g. for prompt-audio validation) as long as you have no other annotation data as input.

  • subset (optional)
    • This parameter allows you to set up the tool to display only a subset of all the segments in each unit
    • Only use this if the “review-from” parameter is present
    • Accepts value from 0 to 1
    • Defaults to 1
    • See the “Review Mode” section for more details
  • force-fullscreen(optional)
    • Accepts 'true' or 'false'.
    • If 'true', a page of work contains a preview of each data row. When a contributor clicks to open a data row, the audio transcription tool loads into a fullscreen view.
    • If 'false', a page of work contains a view of the audio transcription tool for each data row. The contributor can open a fullscreen view of the tool at their discretion by clicking an icon in the top right corner of the tool or using a hotkey.
    • Defaults to 'false'.
  • task-type (optional)
    • Please set task-type=”qa” when designing a review or QA job. This parameter needs to be used in conjunction with review-data . See this article for more details.
  • allow-timestamping (optional)

    • set allow-timestamping="true" to enable the timestamping functionality in your transcription task

      • contributors will see the “add timestamp” button in each transcription box that allows them to insert timestamps within their transcription

      • timestamps can be used to generate more granular text/audio alignment and/or to allow contributors to correct and improve the segmentation points in the source data

      • timestamps appear in the output data like this: this is a <12.345/> transcription

    • Note: This parameter defaults to "false" if not declared


The audio transcription ontology is where you define the metadata that transcribers will use to label and tag audio files or segments.

You can access the ontology by clicking the link to 'Manage Audio Transcription Ontology' that appears on the right corner of the job's Design Page.


Fig 2: Audio Transcription Ontology Manager

The top-level metadata defined in the audio transcription ontology consists of labels, event tags, and span tags.

  • Labels

    • Contributors will apply your labels at the segment level.

    • Labels are defined as members of label groups. Groups are just a way to keep related labels together according to common attributes and rules; groups themselves are not metadata to be labeled.

    • Label groups require:

      • a name

      • at least one label inside them.

        • Labels must be unique, even between groups.

    • At the “label group” level, we can define:

      • if selecting a label from the group is mandatory

      • if users can select multiple labels from the group

      • if the group is not transcribable

        • By default, we assume a segment is transcribable and show the transcribable labels.

          • In this case, mandatory|transcribable applies.

        • If a segment is marked as “nothing to transcribe”, only non-transcribable labels should be available.

          • In this case, mandatory|non-transcribable applies.

          • The transcription box is not available.

    • You do not need to create any labels (i.e. you needn’t create any groups if you need no labels), but if you want to include labels, they must be inside a group.

  • Event Tags

    • Event tags are optional.

    • Event tags require a name.

    • You may also provide a description for each tag, which will be visible to the contributor in the tool when they click on the info icon for that tag.

  • Span Tags

    • Span tags are optional.

    • Span tags require a name.

    • You may also provide a description for each tag, which will be visible to the contributor in the tool when they click on the info icon for that tag.

Reviewing Results

Results will be provided as a secure link to a JSON file describing the annotations.

Important note: Due to security reasons, JSON result links will expire 15 days after generation. To receive non-expired result links, please re-generate the result reports.

The objects in the JSON include the following: 

  • For each segment:

    • id
      • The universally unique identifier of every segment.
    • startTime and endTime
      • This will be displayed in seconds, to the millisecond. 
      • These fields are inherited from the segmentation data. 
    • labels
      • This field will contain the labels as indicated by the contributor.
    • transcription
      • This field will contain the transcription text as entered by the contributor. Tags and timestamps will also appear in the transcription field.
  • For the entire audio file:
    • nothingToTranscribe
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe the entire audio file. 
    • abletoAnnotate
      • This will be Booleantrue or false
      • This will be false if the tool was unable to load the audio file.

Annotation Schema

interface AudioToolNewOutput {
annotation: {
segments: {
// main information about the segment
id: string; //prefixed with segment so as not to confuse this as annotation id
startTime: number;
endTime: number;

// Kept the following one as extra info until the layers feature is removed from the tool entirely,
// otherwise users would see layers in first load and if they load judgment or autosaved data (on page refresh) subsequently, they wouldn't.
// this inconsistency would bring confusion
layerId: string;

labels: string[]; //comes if 'type' includes labelling & 'task-type' is labelling or qa
transcription: string[]; // comes if 'type' includes transcription & 'task-type' is labelling or qa

metadata: {
// all other info for the segment
comment?: string; // can be present in any qa judgment
feedbackAcknowledged: boolean; // comes from acknowledgment task

ableToAnnotate: boolean;
nothingToTranscribe: boolean; // did not change it to nothingToAnnotate since this field is not being used for anything else so it was not useful to update

Review Mode

We have created an experience specially designed for the purpose of quality management.


For job creators

On the job creator’s side, when using the “review-from” CML parameter, you can also specify a “subset”, so that the tool display only a random sample of all the segments in each audio file. This feature would make the reviewing process a lot more efficient. The reviewers would still be able to review every single audio file, but much quicker. This feature is ideal if your goal is to get an idea of the transcription quality of each audio file.

In the review job’s output, you will see two additional fields under “metadata”:

  1. “original_text”: in case the transcription is changed by the reviewer, this field records the original transcriptions that are loaded as input data. This field makes it easy to calculate the word error rate of each unit.
  2. “review_status”: for all the segments that have been randomly selected for the review job, they will have this attribute set to “reviewed”.

Please note that this feature is designed to review jobs with only 1 judgment per row.


For contributors

We have added a “reviewed” button for review jobs, which ensures that the reviewer must go through every single segment before being able to submit. As a reviewer working on a review job, for each segment, they will need to click on the “reviewed” button after reviewing the transcription.

Additional Notes:

This product is in BETA, so please consider the following important notes: 

  1. Reviewing/viewing submitted annotations via the unit page is not currently supported. 
  2. The job must be set up in the Code Editor; the tool is not supported in the Graphical Editor yet. 
  3. Audio Transcription jobs do not support test questions or aggregation at this stage. 
  4. Launching this type of job requires one of our trusted contributor channels. Please reach out to your Customer Success Manager to set this up. 

Accepted Segmentation Input Schemas


SegmentInput {
  id: string;
  startTime: number;
  endTime: number;
  ontologyName?: string;
  layerId?: string;

SegmentsDataInput {
  annotation: SegmentInput[][];
  nothingToAnnotate: boolean;



interface SegmentInput {
  id: string;
  startTime: number;
  endTime: number;
  ontologyName?: string;
  layerId?: string;

interface AudioAnnotationData {
annotation: SegmentInput[][];
nothingToAnnotate: boolean;
type SegmentsDataInput = SegmentInput[] | AudioAnnotationData;
  • Segments input data is not required
  • The old input format (cml:audio_annotation output) is partially supported by the tool. Segment boundaries will be respected but ontology classes and names will not work in the new tool or ontology format.
  • If you have segments or other pre-annotations to display they should be in the above format.

Was this article helpful?
6 out of 8 found this helpful

Have more questions? Submit a request
Powered by Zendesk