Guide to: Running an Image Transcription Job

Our image transcription tool allows users to combine image annotation and transcription in one job, simplifying their workflow.

Building a Job


If using our OCR assistance feature, images must be CORS configured to allow predictions to be made on the text. Please view our Guide to CORS configuring an s3 bucket for assistance.


Currently, there is no Graphical Editor support for this tool. Here is sample CML to build the job:

<cml:image_transcription type="['box']" image-url="{{image_url}}" validates="required" ontology="true" name="annotation" label="Annotate this image" crosshair="true" box-threshold=”0.7” class-threshold=”0.7” />


Below are the parameters available for the job design. Some are required in the element, some are optional.

  • type (required)
    • The shape used in the job; currently accepts only ‘box’
  • image-url (required)
    • The column from your source data that contains the image URLs to be annotated.
  • name (required)
    • The results header where annotations will be stored.
  • label (required)
    • The question label contributors will see.
  • validates (optional)
    • Whether or not this element is required to be answered.
    • Accepts ‘required
    • Defaults to not required if not present
  • ontology (optional)
    • The list of classes to be labeled in an image - view this article to learn how to create your custom ontology.
    • Accepts a boolean
    • Defaults to ‘false’ if not present
  • review-from (optional)
    • This will read in existing annotations on an image. The format must match the output shown below. The following is required:
      • 'id'
        • A randomly-generated, 32-character UUID
      • 'class
        • The class from the ontology
      • 'type
        • This is the shape type, which is 'box'
      • 'instance
        • The shape class instance, which loads in the ontology sidebar
      • 'coordinates
        • The coordinates for the bounding box
      • 'metadata'
        • This includes the following:
          • inputType
            • For transcription, this will always be ‘text’
          • text
            • This is the transcription for the box
          • type
            • This is the shape type, which is 'box'
      • Example:[{"id":"677706c8-f405-4a2c-9be1-1b6f4c5042a2","class":"Business Name","instance":1,"metadata":[{"inputType":"text","text":"Figure Eight"}],"type":"box","coordinates":{"x":250,"y":177,"w":26,"h":15}}]
  • box-threshold (optional)
    • The minimum overall bounding box IoU required for a contributor to pass a test question.
    • Accepts a decimal value between 0.1 and 0.99.
  • class-threshold (optional)
    • The minimum percentage of correct classes applied to boxes in a test question for a contributor to be considered correct.
    • Accepts a decimal value between 0.1 and 0.99.
    • The formula is classes correct / (total classes correct + incorrect).
    • Example: the class-threshold is set to 0.7 and a test question contains 10 ground truth boxes. A contributor gets 8 out of 10 classes correct for a score of 80% and would be considered correct for that test question.
  • crosshair (optional)
    • Will enable crosshair location indication
    • Accepts a boolean
    • Defaults to ‘false’ if not present
  • ocr (optional)
    • When set to 'true', this enables OCR transcription assistance in the tool.
    • This feature must be enabled for your team for access and is not included in every subscription plan; please contact your Customer Success Manager or Account Executive for more information.
  • output-format (optional)

    • Accepts ‘json’ or ‘url’.

    • If ‘json’, the report column containing contributors' annotation data contains the annotation data in string JSON format.

    • If ‘url’, the report column containing contributors' annotation data contains links to files. Each file contains annotation data for a single data row in JSON format.

    • Defaults to 'json'.

  • language (optional)
    • This can only be used when ocr="true"

    • Accepts a liquid variable; the column in your source data must contain an ISO 639-1 code
    • The supported languages and their codes are the following:
      • 'af': 'Afrikaans', 'ar': 'Arabic', 'cs': 'Czech', 'da': 'Danish', 'de': 'German', 'en': 'English', 'el': 'Greek', 'es': 'Spanish', 'fi': 'Finnish', 'fr': 'French', 'ga': 'Irish', 'he': 'Hebrew', 'hi': 'Hindi', 'hu': 'Hungarian', 'id': 'Indonesian', 'id': 'Italian', 'jp: 'Japanese', 'ko': 'Korean', 'nn': 'Norwegian', 'nl': 'Dutch', 'pl': 'Polish', 'pt': 'Portugese', 'ro': 'Romanian', 'ru': 'Russian', 'sv': 'Swedish', 'th': 'Thai', 'tr': 'Turkish', 'zh': 'Chinese', 'vi':'Vietnamese', 'zh-sim': 'Chinese (Simplified)', 'zh-tra': 'Chinese (Traditional)'
    • If an invalid or unsupported ISO code is passed in from the source data, the in-tool OCR will default to English and will not recognize non-English letters or diacritics


Fig. 1: Example of the image transcription tool built-in CML via Unit Page

Creating Test Questions

In BETA, test questions are only partially supported. You may test on the boxes and the classes, but not the transcriptions.

  1. On the Quality Page, click 'Create Test Questions'.
  2. Add boxes around the text in the way specified in the job's instructions.
  3. If no annotations are needed, make sure the job includes an option, such as a single checkbox, to hide the annotation tool.
  4. Save the test question.

Reviewing Test Questions

  1. Select a test question from the Quality Page.
  2. Fron the image annotation sidebar, click 'Find a Judgment' and choose a contributor ID from the drop-down menu.
  3. Edit, create or remove the test question annotations based on the feedback. Judgments are color-coded based on if they match the gold responses.
    • Each box will have its own matching metrics, which can be seen by hovering over a contributor judgment or golden shape. A notification will appear in the top left corner of the image. A score from zero to one is displayed on the intersection over union formula. If using an ontology, the class match is also displayed.
    • All scores on images are averaged and compared to the test question threshold set in the job design. The overall matching score is then displayed in the left sidebar of the tool.
  4. Save any edits that are made to update the  evaluation of the existing contributors' work and ensure any future attempts to answer the test question will be properly evaluated.


Fig. 2: Reviewing Test Question Judgments

Monitoring and Reviewing Results

As this is a BETA feature, aggregation is not supported. Jobs should be run either to a trusted partner or in a peer review workflow. To set that up, you simply use the review-from parameter outlined above.


  • Example output from an image transcription job:

[{"id":"33b639f1-010e-4b1d-9274-7b49acd20dde","class":"Company Name","metadata":[{"inputType":"text","label":"Transcription","modelType":"ocr","text":"Figure Eight"}],"type":"box","coordinates":{"x":114,"y":61,"w":132,"h":35}]

  • Most classes were defined in the ‘review-from’ parameter; the remaining class you’ll see in the output is: ‘modelType’
    • This will always be 'ocr' for now

Reviewing Results

To review the results of your job, you can do the following:

  1. Go to the Data page.
  2. Click on a unit ID.
  3. In the sidebar of the annotation tool, select an option from the drop-down menu.
    • You’ll see different contributor IDs, which allow you to view individual annotations.
  4. Click on a box to view its transcription. 


Was this article helpful?
1 out of 1 found this helpful

Have more questions? Submit a request
Powered by Zendesk