Follow

Guide to: Running an Audio Transcription Job

Overview

The cml:audio_transcription tag allows users to create an audio transcription job with custom labels and tag sets.

Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.

Picture_1.png

Fig. 1: Audio Transcription tool interface for Contributors


Building a Job

Data

  • The audio transcription tool supports the transcription of .wav, .mp3, and .ogg file types, as well as .mp4 and .mov (see video parameter, below)
  • Your data must be CORS configured.
  • All data access control features are supported for this tool, including Secure Data Access.

CML

As this tool is in open beta, there is no Graphical Editor support yet. For access to the tool's CML gem, please reach out to your Appen contact.

Parameters

Below are the parameters available for the job design. Some are required in the element, while some are optional.

  • type (required - as of 16 January 2023)
    • transcription- enables the transcription field, as well as the tags and timestamps.
      • timestamps still also require allow-timestamping
    • labeling- enables the labels, as configured in the ontology.
    • play-only- only the audio player (and video, if configured) will be available to contributors.

      • this type can only be used by itself. For example: type="['play-only']" 

    • none- if no type is configured, only the audio player will be available to contributors.
    • Examples:
      •  type="['labeling', 'transcription']"Allows labeling and transcription (including tags/timestamps)
      • type="['labeling', 'segmentation']"Allows labeling and segmentation (including tags/timestamps)
    • A note about type and its interaction with review-data (see below): When you load data for review into the tool using review-data, all annotations provided will be visible, including transcriptions, tags, and labels. Use type to control which parts of the data a contributor can edit. For example, if you use type="['transcription']" and your review data contains both transcription and labels, contributors will be able to see both the labels and the transcriptions, but they will only be able to edit the transcriptions. 
  • source-data (required)
    • The column header from your source data containing the audio URLs to be annotated.
  • name (required)
    • The results header where the annotations will be stored.
  • segments-data (optional)

    • The column header from your source data containing the audio segmentation data (the start and end timestamps of each segment).

      • The tool uses this data to create the transcription box for each segment.

      • The tool expects the data to be in the format found below.

      • If you do not have segmentation data, omit this parameter.

  • label (optional)
    • The question label that the contributors will see.
  • validates (optional)
    • validates="required"
      • Defines whether or not the element is required to be answered
      • Defaults to not required if not present
      • Defaults to "required" if this is the only CML tag present in the job design, as there must be at least one required element
    • validates="timestamp-direction"
      • checks whether the timestamps are in the right order upon submission
      • regardless of the specified text direction or language, the waveform always runs right to left, therefore if timestamps are not placed in left-to-right order, submission will be blocked and contributors will encounter an error
    • to use both validators, separate them with a space:
      • validates="required timestamp-direction"
  • review-data (optional)
    • This will read in existing transcriptions on an audio file.
    • If used, a source column with links to the transcriptions formatted as is outputted by the audio transcription tool is required (format as seen below in the 'Results' section).
    • This parameter may be used to do a peer review or model validation job.
    • Please see the “Review Mode” section for more details
    • You can use raw text input as your review-data (e.g. for prompt-audio validation) as long as you have no other annotation data as input.

    • As mentioned above, when you load data for review into the tool using review-data, all annotations provided will be visible, including transcriptions, tags, and labels. Use type to control which parts of the data a contributor can edit. For example, if you use type="['transcription']" and your review data contains both transcription and labels, contributors will be able to see both the labels and the transcriptions, but they will only be able to edit the transcriptions. 
  • subset (optional)
    • This parameter allows you to set up the tool to display only a subset of all the segments in each unit
    • Only use this if the “review-data” parameter is present
    • Accepts value from 0 to 1
    • Defaults to 1
    • See the “Review Mode” section for more details
  • force-fullscreen(optional)
    • Accepts 'true' or 'false'.
    • If 'true', a page of work contains a preview of each data row. When a contributor clicks to open a data row, the audio transcription tool loads into a fullscreen view.
    • If 'false', a page of work contains a view of the audio transcription tool for each data row. The contributor can open a fullscreen view of the tool at their discretion by clicking an icon in the top right corner of the tool or using a hotkey.
    • Defaults to 'false'.
  • task-type (optional)
    • Please set task-type=”qa” when designing a review or QA job. This parameter needs to be used in conjunction with review-data . See this article for more details.
  • allow-timestamping (optional)

    • set allow-timestamping="true" to enable the timestamping functionality in your transcription task

      • contributors will see the “add timestamp” button in each transcription box that allows them to insert timestamps within their transcription

      • timestamps can be used to generate more granular text/audio alignment and/or to allow contributors to correct and improve the segmentation points in the source data

      • timestamps appear in the output data like this: this is a <12.345/> transcription

    • Note: This parameter defaults to "false" if not declared

  • text-direction (optional)

    • Set text-direction="rtl" to specify that the language you are transcribing is written from right-to-left (e.g. Arabic). This will ensure that any tags and timestamps are placed in the correct sequential location in the text.

    • It is recommended to use this in combination with the timestamp-direction validator described above
    • If the tags themselves are in English or another left-to-right language, (e.g. </noise>) they will continue to be displayed in the right direction

    • Note: this parameter defaults to "ltr" if not explicitly defined.
  • video (optional)

    • Set video="true" and beta="true" to enable display of video along with the audio.

    • Ensure that your data is in one of the supported formats: .mp4 or .mov

    • Video data must include an audio track to ensure the tool is usable.
    • Note: this parameter defaults to "false" if not explicitly defined. If your data is .mp4 or .mov, the tool will play audio, but the video will not be displayed.

Screenshot 2024-03-22 at 11.35.28 AM.png


Ontology

The audio transcription ontology is where you define the metadata that transcribers will use to label and tag audio files or segments.

You can access the ontology by clicking the link to 'Manage Audio Transcription Ontology' that appears on the right corner of the job's Design Page.

681d5442-4c28-46f2-9571-0d93d3ab378d.png

Fig 2: Audio Transcription Ontology Manager

The top-level metadata defined in the audio transcription ontology consists of labels, event tags, and span tags.

  • Labels

    • Contributors will apply your labels at the segment level.

    • Labels are defined as members of label groups. Groups are just a way to keep related labels together according to common attributes and rules; groups themselves are not metadata to be labeled.

    • Label groups require:

      • a name

      • at least one label inside them.

        • Labels must be unique, even between groups.

    • At the “label group” level, we can define:

      • if selecting a label from the group is mandatory

      • if users can select multiple labels from the group

      • if the group is not transcribable

        • By default, we assume a segment is transcribable and show the transcribable labels.

          • In this case, mandatory|transcribable applies.

        • If a segment is marked as “nothing to transcribe”, only non-transcribable labels should be available.

          • In this case, mandatory|non-transcribable applies.

          • The transcription box is not available.

    • You do not need to create any labels (i.e. you needn’t create any groups if you need no labels), but if you want to include labels, they must be inside a group.

  • Event Tags

    • Event tags are optional.

    • Event tags require a name.

    • Event tags are initially displayed in alphabetical order. However, you have the ability to customize their order of presentation by dragging and arranging the tags as per you prefer.

    • You may also provide a description for each tag, which will be visible to the contributor in the tool when they click on the info icon for that tag.

  • Event Groups
    • Event groups are optional

    • Event groups require a name

    • Event groups are initially displayed in alphabetical order. However, you have the ability to customize their order of presentation by dragging and arranging the groups as per you prefer.

    • You can assign event tags to event groups using a drop down within the event tag OR by dragging an existing event tag inside an event group.

  • Span Tags

    • Span tags are optional.

    • Span tags require a name.

    • Span tags are initially displayed in alphabetical order. However, you have the ability to customize their order of presentation by dragging and arranging the tags as per you prefer.

    • You may also provide a description for each tag, which will be visible to the contributor in the tool when they click on the info icon for that tag.

  • Span Groups

    • Span groups are optional

    • Span groups require a name

    • Span groups are initially displayed in alphabetical order. However, you have the ability to customize their order of presentation by dragging and arranging the groups as per you prefer.

    • You can assign event tags to span groups using a drop down within the span tag OR by dragging an existing span tag inside an span group.


Reviewing Results

Results will be provided as a secure link to a JSON file describing the annotations.

Important note: Due to security reasons, JSON result links will expire 15 days after generation. To receive non-expired result links, please re-generate the result reports.

The objects in the JSON include the following: 

  • For each segment:

    • id
      • The universally unique identifier of every segment.
    • startTime and endTime
      • This will be displayed in seconds, to the millisecond. 
      • These fields are inherited from the segmentation data. 
    • labels
      • This field will contain the labels as indicated by the contributor.
    • transcription
      • This field will contain the transcription text as entered by the contributor. Tags and timestamps will also appear in the transcription field.
  • For the entire audio file:
    • nothingToTranscribe
      • This will be Booleantrue or false
      • This will be true if the contributor has indicated they were unable to transcribe the entire audio file. 
      •  
    • abletoAnnotate
      • This will be Booleantrue or false
      • This will be false if the tool was unable to load the audio file.

Annotation Schema

interface AudioToolNewOutput {
annotation: {
segments: {
// main information about the segment
id: string; //prefixed with segment so as not to confuse this as annotation id
startTime: number;
endTime: number;

// Kept the following one as extra info until the layers feature is removed from the tool entirely,
// otherwise users would see layers in first load and if they load judgment or autosaved data (on page refresh) subsequently, they wouldn't.
// this inconsistency would bring confusion
layerId: string;

labels: string[]; //comes if 'type' includes labelling & 'task-type' is labelling or qa
transcription: string[]; // comes if 'type' includes transcription & 'task-type' is labelling or qa

metadata: {
// all other info for the segment
comment?: string; // can be present in any qa judgment
feedbackAcknowledged: boolean; // comes from acknowledgment task
};
}[];
}

ableToAnnotate: boolean;
nothingToTranscribe: boolean; // did not change it to nothingToAnnotate since this field is not being used for anything else so it was not useful to update

Review Mode

We have created an experience specially designed for the purpose of quality management.

 

For job creators

On the job creator’s side, when using the task-type="qa"review-data  CML parameters, you can also specify a subset , so that the tool display only a random sample of all the segments in each audio file. The reviewers can review each audio file, but much quicker. This feature is ideal if your goal is to get an idea of the transcription quality of each audio file.

In the review job’s output, you will see two additional fields under “metadata”:

  1. “original_text”: in case the transcription is changed by the reviewer, this field records the original transcriptions that are loaded as input data. This field makes it easy to calculate the word error rate of each unit.
  2. “review_status”: for all the segments that have been randomly selected for the review job, they will have this attribute set to “reviewed”.

Please note that this feature is designed to review jobs with only 1 judgment per row.

 

For contributors

We have added a “reviewed” button for review jobs, which ensures that the reviewer must go through every single segment before being able to submit. As a reviewer working on a review job, for each segment, they will need to click on the “reviewed” button after reviewing the transcription.


Additional Notes:

This product is in BETA, so please consider the following important notes: 

  1. Reviewing/viewing submitted annotations via the unit page is not currently supported. 
  2. The job must be set up in the Code Editor; the tool is not supported in the Graphical Editor yet. 
  3. Audio Transcription jobs do not support test questions or aggregation at this stage. 
  4. Launching this type of job requires one of our trusted contributor channels. Please reach out to your Customer Success Manager to set this up. 

Accepted Segmentation Input Schemas

OLD:

SegmentInput {
  id: string;
  startTime: number;
  endTime: number;
  ontologyName?: string;
  layerId?: string;
}

SegmentsDataInput {
  annotation: SegmentInput[][];
  nothingToAnnotate: boolean;
}

 

NEW:

interface SegmentInput {
  id: string;
  startTime: number;
  endTime: number;
  ontologyName?: string;
  layerId?: string;
}

interface AudioAnnotationData {
annotation: SegmentInput[][];
nothingToAnnotate: boolean;
}
type SegmentsDataInput = SegmentInput[] | AudioAnnotationData;
  • Segments input data is not required
  • The old input format (cml:audio_annotation output) is partially supported by the tool. Segment boundaries will be respected but ontology classes and names will not work in the new tool or ontology format.
  • If you have segments or other pre-annotations to display they should be in the above format.

Was this article helpful?
15 out of 18 found this helpful


Have more questions? Submit a request
Powered by Zendesk