Guide to: Running an Audio Transcription Job


Appen's audio transcription tool allows users to transcribe audio files based on a custom ontology. 

Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.


Fig. 1: Audio Transcription tool interface for Contributors

Building a Job


  • The audio transcription tool supports the transcription of .wav, .mp3, and .ogg file types.
  • The tool currently supports audio files as long as 20 minutes in duration, and segments as short as 1 second in duration.
  • Your data must be CORS configured.
  • All data access control features are supported for this tool, including Secure Data Access.


As this tool is in open beta, there is no Graphical Editor support yet. For access to the tool's CML gem, please reach out to your Customer Success Manager.


Below are the parameters available for the job design. Some are required in the element, while some are optional.

  • audio-url (required)
    • The column header from your source data containing the audio URLs to be annotated.
  • name (required)
    • The results header where the annotations will be stored.
  • audio-annotation-data (required)
    • The column header from your source data containing the audio segmentation data (the start and end timestamps of each segment).
      • The tool uses this data to create the transcription box for each segment.
      • The tool expects the data to be in the same format as the output of Appen's audio annotation tool.
        • An example file of this formatted data is attached at the bottom of this article.
  • label (optional)
    • The question label that the contributors will see.
  • validates (optional)
    • Defines whether or not the element is required to be answered.
    • Accepts 'required'
    • Defaults to not required if not present.
  • review-from (optional)
    • This will read in existing transcriptions on an audio file.
    • If used, a source column with links to the transcriptions formatted as is outputted by the audio transcription tool is required (format as seen below in 'Results' section).
    • This parameter may be used to do a peer review or model validation job.
  • force-fullscreen(optional)
    • Accepts 'true' or 'false'.
    • If 'true', a page of work contains a preview of each data row. When a contributor clicks to open a data row, the audio transcription tool loads into a fullscreen view.
    • If 'false', a page of work contains a view of the audio transcription tool for each data row. The contributor can open a fullscreen view of the tool at their discretion by clicking an icon in the top right corner of the tool or using a hotkey.
    • Defaults to 'false'.
  • intervals (optional)
    • Accepts 'true' or 'false'.
    • If 'true', the transcription tool displays large tick marks every 1 second and small tick marks every 100 milliseconds on top of the audio waveform. These are useful whenever the contributor needs to judge time durations in order to annotate events in the audio. Currently only supported for audio files up to ~10 minutes long.
    • If 'false', no tick marks are displayed.
    • Defaults to false.


Audio transcription jobs use a custom ontology that is specific to the audio transcription tool.

You can access the ontology by clicking the link to 'Manage Audio Transcription Ontology' that appears on the right corner of the job's Design Page.

Important note: The job's ontology should be set up to exactly match the ontology of the audio segmentation data; in other words, each class present in the audio segmentation job should be included in the audio transcription ontology accordingly.



Fig. 2: How to find the Ontology Manager via the Design Page

blobid3.pngFig. 3: The Ontology Manager

Every class in the ontology will have the following attributes:

  • Title
    • This is the class name that will appear in the results and to the contributors working in the job.
      • Important note: the casing of the class must match the casing of the class in the audio segmentation job ontology.
  • Transcribe this layer
    • If segmentation has been performed in a class, but transcription of that class is not desired, the option to transcribe this later should be disabled by deselecting the 'transcribe this layer' checkbox.
    • If disabled, segments and transcription bubbles will be absent from the contributor annotation interface. 
      • Example: If segmentation has been performed to mark periods of 'noise', it may make sense to disable transcription for this class.
  • Description
    • Descriptions can be used to give clear instructions to contributors on what you want to be annotated with the class.
  • Color
    • The color of each class.
    • Unlike other annotation tools, the platform will auto-assign annotation colors to each class. There are 16 pre-selected easy to read and use colors.
    • If you would like to add custom colors, you may upload your ontology manually with a CSV file.
      • A sample CSV file with accepted ontology format is attached to the bottom of this article.
  • Spans and Events
    • You can optionally enable the ability for the contributor to add span and events tags to their transcriptions by adding one or more span or event labels in the ontology editor for a class.
    • You may also provide a description for each span and event tag, which will be visible to the contributor in the tool when they click on the info icon for that span or event.

Important note: The tool can only support a maximum of 8 layers of audio class instances. If a class has multiple instances, each instance counts as one layer. The class instances beyond 8 layers cannot be displayed or transcribed.


Fig. 4: Editing a Class in the Audio Transcription Ontology Manager


Fig. 5: Enabling Span and Event Tags for a Class in the Audio Transcription Ontology Manager

Reviewing Results

Results will be provided as a secure link to a JSON file describing the annotations.

Important note: Due to security reasons, JSON result links will expire 15 days after generation. To receive non-expired result links, please re-generate the result reports.

The objects in the JSON include the following: 

  • For each segment:

    • layerID and ID
      • The universally unique identifier of every segment, along with the ID of the layer it belongs to. 
      • These fields are inherited from the segmentation data. 
    • startTime and endTime
      • This will be displayed in seconds, to the millisecond. 
      • These fields are inherited from the segmentation data. 
    • ontologyName
      • This is the class name assigned to the segment. 
      • This field is inherited from the segmentation data.
    • metadata
      • transcription
        • text
          • This field will contain the transcription text as entered by the contributor 
        • annotatedBy
          • This will be "Human" if transcribed by a contributor and "Machine" if the transcriptions were populated by a model and left unedited by the contributor. 
    • nothingToTranscribe
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe this segment. 
        • Example: The segment does not include any speech. 
  • For the entire audio file:

    • nothingToAnnotate
      • This is inherited from the segmentation data.
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to segment the audio file in the segmentation job. 
        • Example: The contributor was asked to segment periods of speech, but the audio does not include any speech. 
    • nothingToTranscribe 
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe the entire audio file.
        • Example: The audio file does not include any speech.
    • abletoAnnotate
      • This will be Booleantrue or false
      • This will be true if the tool was unable to load the audio file.

Additional Notes:

This product is in BETA, so please consider the following important notes: 

  1. Reviewing/viewing submitted annotations via the unit page is not currently supported. 
  2. The job must be set-up in the Code Editor; the tool is not supported in the Graphical Editor yet. 
  3. Audio Transcription jobs do not support test questions or aggregation at this stage. 
  4. Launching this type of job requires one of our trusted contributor channels. Please reach out to your Customer Success Manager to set this up. 

Was this article helpful?
1 out of 1 found this helpful

Have more questions? Submit a request
Powered by Zendesk