How to: Design a Video Object Tracking Job


Data preparation for a Video Object Tracking job is simple but has a few key aspects outlined below to ensure the data is processed correctly and is able to be annotated. In order to get started here is what you’ll need:

Source Data

The source data can be either a video, or a list of frames. When using videos as source data, the video files should:

  • Be hosted and publicly viewable.
    • In MP4 or AVI format 
    • The video files should not exceed 2gb or 35 minutes in duration.
  • Broken into sensible rows of work for a contributor to complete.
    • We recommend as a best practice to have about 100 frames per video. Depending on your frame rate this will be between 3 and 10-second clips per video.

Here is an example source file:



When using frames as source data, instead of using an URL of the video file, you would be using an URL of a CSV file containing a list of URLs each pointing to a single frame. The CSV file contains 1 URL per line and doesn’t need a header.


Here is an example source file:


Here is an example csv file containing frames of a video file:



Note: Secure Data Access is supported if using frames as source data. Please contact your Customer Success Manager to set up SDA for your video object tracking job.


  1. At least one column with the link to the source data to be annotated with a column header (Ex. “video-url” or “frames”)
    • As needed, you can pass any other metadata along as columns.

An Ontology:

  • You will need to create or upload an ontology of at least one class. To do this, first navigate to the Design>Ontology Manager page.


  1. Make an instructional video
    • Creating a video will help you understand the tool and discover some of the edge cases in your data. It will also give contributors context on how the videos should be annotated.
  2. Provide guidance on how the tool works
    • The video annotation tool has built in features to help contributors annotate more efficiently. This includes a full menu of hotkeys and tooltips. Feel free to copy paste these tips into your instructions:
Function Hotkey Tooltip
Pan Hold Spacebar Pan (spacebar)
Zoom In + Zoom In (+)
Zoom Out - Zoom Out (-)
Reframe r/R Reframe (R)
Focus Mode f/F Focus Mode (F)
Hide Mode h/H Hide Mode (H)
Show Fullscreen e/E Minimize/Expand (E)
Play/Pause p/P Play/Pause (P)
Previous Frame Prev Frame (←)
Next Frame Next Frame (→)


CML for a Video Object Tracking job is available to our Enterprise Appen users. Please contact your Customer Success Manager for access to this code. The product is in BETA, so please consider the following:

  1. The job needs to be designed in CML and there is currently no graphical editor for this tool
  2. Launching this job requires one of our trusted video annotation channels. Please reach out to your Customer Success Manager to set this up.
  3. If you need any help, don’t hesitate to reach out to the Appen Platform Support team.


Below are the parameters available for the cml:video_shapes tag. Some are required in the element, some can be left out.

  • name: the name of your output column
  • source-data: the column header containing the video or set of frames to be annotated
  • use-frames: default to “false”. If set to “true”, the tool would be expecting a URL pointing to a list of frames as source data.
  • assistant="linear_interpolation" ,"object_tracking",  or "none":
    • There are two different types of machine assistance to create annotations:
      • Object Tracking is ideal when:
        • The camera is moving
        • Note: This is only available for bounding boxes. If any other shapes are being used, your job must use Linear Interpolation. 
      • Linear Interpolation is ideal when:
        • The objects are moving in a linear fashion
        • The camera is stationary
        • The objects being tracked are small and often change in size
      • Configure assistant="none" when no interpolation between frames is desired. This may be helpful for a review or QA job.
  • type: accepts a comma-separated array of any of the four shapes types. 
    • Example: type="['box','polygon','dot','line', 'ellipse']
  • review-data: This is an optional parameter that will be the column header containing pre-created annotations for a video. The format must match the output of the video shapes tool (JSON in a hosted URL).
  • task-type:  Please set task-type=”qa” when designing a review or QA job. This parameter needs to be used in conjunction with review-data. See [this article] for more details.
  • require-views: This is an optional parameter that accepts 'true' or 'false'
    • If 'false', contributors are not required to view every frame of the video before submitting.
  • allow-frame-rotation (optional):
    • Accepts true or false
    • If true, contributors can rotate the video frames within the video annotation tool. Contributors click a toolbar icon to turn on a rotation slider that can be used to adjust rotation angle from 0 to 359 degrees. Contributors do not have to manually rotate every frame; rotation angle is linearly interpolated between manually rotated frames. Interpolation happens between manually rotated frames; frames after the last manually rotated frame inherit its degree of rotation
  • Defaults to false if attribute not present.


Figure 1: Demonstration of video frame rotation interpolation. The contributor manually rotates frame 7, and frames 2–6 are machine rotated, interpolating the relative rotation between frames 1 and 7.



  • Necessary settings:
    • 1 row per page
    • 1 judgment per row
    • At least 3 hours per assignment
      • This can be set via the API using the following command or by contacting the Appen Platform Support team.
        • Set Time Per Assignment
        • curl -X PUT --data-urlencode "job[options][req_ttl_in_seconds]={n}" "https://api.appencom/v1/jobs/{job_id}.json?key={api_key}"

Considerations When Using Videos as Source Data

Unlike a typical Appen job, once the job is launched we will pre-process the video data linked in the job. While this is occurring for each row the row will be in state “preprocessing” before becoming “judgable”. 

If the unit cannot be preprocessed it will be automatically canceled. This is to prevent contributors from seeing a broken tool and collecting annotations on incorrectly formatted data. Some common reasons a video row may be canceled are:

  • The video file is too large or contains too many frames
  • The URL provided does not lead to a visible video file - either the permissions are incorrect or the file is otherwise corrupted

If this occurs and you’re able to identify and correct the issue you can re-upload the video and order a judgment on the new rows.

Note: Preview of the job and tool will not work prior to launch. The frames need preprocessing before they can be loaded and as a result, processing will not begin until the job is launched.


Was this article helpful?
5 out of 5 found this helpful

Have more questions? Submit a request
Powered by Zendesk