Guide to: Running an Audio Transcription Job


The cml:audio_transcription tag allows users to create an audio transcription job with custom label and tag sets.

Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.


Fig. 1: Audio Transcription tool interface for Contributors

Building a Job


  • The audio transcription tool supports the transcription of .wav, .mp3, and .ogg file types.
  • Your data must be CORS configured.
  • All data access control features are supported for this tool, including Secure Data Access.


As this tool is in open beta, there is no Graphical Editor support yet. For access to the tool's CML gem, please reach out to your Customer Success Manager.


Below are the parameters available for the job design. Some are required in the element, while some are optional.

  • source-data (required)
    • The column header from your source data containing the audio URLs to be annotated.
  • name (required)
    • The results header where the annotations will be stored.
  • audio-annotation-data (optional)
    • The column header from your source data containing the audio segmentation data (the start and end timestamps of each segment).
      • The tool uses this data to create the transcription box for each segment.
      • The tool expects the data to be in the same format as the output of Appen's audio annotation tool.
        • An example file of this formatted data is attached at the bottom of this article.
      • If you do not have segmentation data, omit this parameter. You will still need to create an ontology with at least one class.

  • label (optional)
    • The question label that the contributors will see.
  • validates (optional)
    • Defines whether or not the element is required to be answered.
    • Accepts 'required'
    • Defaults to not required if not present.
    • Defaults to ‘required’ if this is the only cml in the job design
  • review-data (optional)
    • This will read in existing transcriptions on an audio file.
    • If used, a source column with links to the transcriptions formatted as is outputted by the audio transcription tool is required (format as seen below in 'Results' section).
    • This parameter may be used to do a peer review or model validation job.
    • Please see “Review Mode” section for more details
  • subset (optional)
    • This parameter allows you to set up the tool to display only a subset of all the segments in each unit
    • Only use this if “review-from” parameter is present
    • Accepts value from 0 to 1
    • Defaults to 1
    • See “Review Mode” section for more details
  • force-fullscreen(optional)
    • Accepts 'true' or 'false'.
    • If 'true', a page of work contains a preview of each data row. When a contributor clicks to open a data row, the audio transcription tool loads into a fullscreen view.
    • If 'false', a page of work contains a view of the audio transcription tool for each data row. The contributor can open a fullscreen view of the tool at their discretion by clicking an icon in the top right corner of the tool or using a hotkey.
    • Defaults to 'false'.
  • task-type (optional)
    • Please set task-type=”qa” when designing a review or QA job. This parameter needs to be used in conjunction with review-data . See this article for more details.


Audio transcription jobs use a custom ontology that is specific to the audio transcription tool.

You can access the ontology by clicking the link to 'Manage Audio Transcription Ontology' that appears on the right corner of the job's Design Page.

Important note: The job's ontology should be set up to exactly match the ontology of the audio segmentation data; in other words, each class present in the audio segmentation job should be included in the audio transcription ontology accordingly.



Fig. 2: How to find the Ontology Manager via the Design Page

blobid3.pngFig. 3: The Ontology Manager

Every class in the ontology will have the following attributes:

  • Title
    • This is the class name that will appear in the results and to the contributors working in the job.
      • Important note: the casing of the class must match the casing of the class in the audio segmentation job ontology.
  • Transcribe this layer
    • If segmentation has been performed in a class, but transcription of that class is not desired, the option to transcribe this later should be disabled by deselecting the 'transcribe this layer' checkbox.
    • If disabled, segments and transcription bubbles will be absent from the contributor annotation interface. 
      • Example: If segmentation has been performed to mark periods of 'noise', it may make sense to disable transcription for this class.
  • Allow Timestamping
    • If checked, contributors will see the “add timestamp” button in each transcription box that allows them to insert timestamps within their transcription
    • Timestamps appear in the output data like this: this is a <12.345/> transcription
    • Check this box if you would like to generate more granular text/audio alignment, or to allow contributors to correct and improve the segmentation points in the source data
  • Description
    • Descriptions can be used to give clear instructions to contributors on what you want to be annotated with the class.
  • Color
    • The color of each class.
    • Unlike other annotation tools, the platform will auto-assign annotation colors to each class. There are 16 pre-selected easy to read and use colors.
    • If you would like to add custom colors, you may upload your ontology manually with a CSV file.
      • A sample CSV file with accepted ontology format is attached to the bottom of this article.
  • Spans and Events
    • You can optionally enable the ability for the contributor to add span and events tags to their transcriptions by adding one or more span or event labels in the ontology editor for a class.
    • You may also provide a description for each span and event tag, which will be visible to the contributor in the tool when they click on the info icon for that span or event.

Important note: The tool can only support a maximum of 16 layers of audio class instances. If a class has multiple instances, each instance counts as one layer. The class instances beyond 16 layers cannot be displayed on the layers but can be transcribed.


Fig. 4: Editing a Class in the Audio Transcription Ontology Manager


Fig. 5: Enabling Span and Event Tags for a Class in the Audio Transcription Ontology Manager

Reviewing Results

Results will be provided as a secure link to a JSON file describing the annotations.

Important note: Due to security reasons, JSON result links will expire 15 days after generation. To receive non-expired result links, please re-generate the result reports.

The objects in the JSON include the following: 

  • For each segment:

    • layerID and ID
      • The universally unique identifier of every segment, along with the ID of the layer it belongs to. 
      • These fields are inherited from the segmentation data. 
    • startTime and endTime
      • This will be displayed in seconds, to the millisecond. 
      • These fields are inherited from the segmentation data. 
    • ontologyName
      • This is the class name assigned to the segment. 
      • This field is inherited from the segmentation data.
    • metadata
      • original_text
        • This field records the pre-loaded transcription if the transcription is modified in the review job
      • review_status
        • This field records the review status of each segment, please see “Review Mode” section for more information.
      • transcription
        • text
          • This field will contain the transcription text as entered by the contributor 
        • annotatedBy
          • This will be "Human" if transcribed by a contributor and "Machine" if the transcriptions were populated by a model and left unedited by the contributor. 
    • nothingToTranscribe
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe this segment. 
        • Example: The segment does not include any speech. 
  • For the entire audio file:

    • nothingToAnnotate
      • This is inherited from the segmentation data.
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to segment the audio file in the segmentation job. 
        • Example: The contributor was asked to segment periods of speech, but the audio does not include any speech. 
    • nothingToTranscribe 
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe the entire audio file.
        • Example: The audio file does not include any speech.
    • abletoAnnotate
      • This will be Booleantrue or false
      • This will be true if the tool was unable to load the audio file.

Review Mode

We have created an experience specially designed for the purpose of quality management.


For job creators

On the job creator’s side, when using “review-from” cml parameter, you can also specify a “subset”, so that the tool display only a random sample of all the segments in each audio file. This is feature would make the reviewing process a lot more efficient. The reviewers would still be able to review every single audio file, but much quicker. This feature is ideal if your goal is to get an idea of the transcription quality of each audio file.

In the review job’s output, you will see two additional fields under “metadata”:

  1. “original_text”: in case the transcription is changed by the reviewer, this field records the original transcriptions that are loaded as input data. This field makes it easy to calculate the word error rate of each unit.
  2. “review_status”: for all the segments that have been randomly selected for the review job, they will have this attribute set to “reviewed”.

Please note that this feature is designed to for review jobs with only 1 judgement per row.


For contributors

We have added a “reviewed” button for review jobs, which ensures that the reviewer must go through every single segment before being able to submit. As a reviewer working on a review job, for each segment, they will need to click on the “reviewed” button after reviewing the transcription.


Additional Notes:

This product is in BETA, so please consider the following important notes: 

  1. Reviewing/viewing submitted annotations via the unit page is not currently supported. 
  2. The job must be set-up in the Code Editor; the tool is not supported in the Graphical Editor yet. 
  3. Audio Transcription jobs do not support test questions or aggregation at this stage. 
  4. Launching this type of job requires one of our trusted contributor channels. Please reach out to your Customer Success Manager to set this up. 

Was this article helpful?
3 out of 4 found this helpful

Have more questions? Submit a request
Powered by Zendesk