Use Workflows to automatically route your unlabeled data across multiple jobs in Appen. This feature connects jobs dependently on the routing rules that you define:
- Route all rows via linear or branched configurations
- Route rows via answer-based configuration
- Route rows randomly
- Route rows via confidence scores of a column
Note: Workflows currently supports use cases that produce aggregated responses. Jobs containing cml:text, cml:textarea, cml:checkboxes are not officially supported at this time.
Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.
Glossary
- Workflow
- A data annotation process containing multiple data operators (i.e. Jobs)
- Operator
- A step in a workflow that produces output, which can be routed to another step in the workflow.
- Connect Canvas
- Enables users to easily connect data operators and configure routing rules.
- Routing Rule
- Use routing rules to filter operator output for specific rows to proceed to the next step.
- Branch
- The splitting of data across multiple operators.
- Workflow Report
- The combined results of all operators in a workflow.
Before You Start
- Mockup and conduct a test run of the jobs you want to include in your workflow before automating them.
- Make new copies of jobs that will be included in the workflow.
- Make sure the job that is copied from has the aggregation method set to "agg" in the report options for all fields. This will ensure your rows are routed properly.
- Add Test Questions
- Jobs should always contain test questions to ensure quality. It is critical to do this before launching your workflow. Once you have finalized your workflow design, add test questions to the jobs in your workflow using any of the following techniques:
- Create Test Questions from high confidence rows. See this guide for more information.
- Copy an existing job that already contains test questions.
- Jobs should always contain test questions to ensure quality. It is critical to do this before launching your workflow. Once you have finalized your workflow design, add test questions to the jobs in your workflow using any of the following techniques:
Creating a Workflow
1. Go to the Workflows page. From here you can view existing workflows, or create new workflows.
Figure 1. How to Create a New Workflow
2. After you have created and named your new workflow, you will be redirected to the Canvas tab to Design the workflow:
- Add your first operator to the workflow by clicking the empty tile
- Note: In workflows, jobs are synonymous with operators
- You'll see the operator panel open which contains your jobs
- There are some restrictions on which jobs are eligible for use in a workflow:
- You will not be able to use any jobs you do not own in your workflow
- You may only add jobs that have not yet been launched. Jobs running or already in use by other workflows will not be available.
- You may only use a job once in a single workflow.
- There are some restrictions on which jobs are eligible for use in a workflow:
Figure 2: Canvas Page of Workflow
- Configuring Routing Rules
- After adding two operators, you will be asked to configure routing rules. You can select any of the following rules below:
- Routing All Rows - route all rows directly to the next operator. When using this option you will not be able to use branching.
- Route by Column Headers - route rows based on column filter.
- First, select a column header you want to use to filter on
- Second, select a filter type:
- Equals
- Does not equal
- Contains
- Is greater than
- Is greater than or equal to
- Is less than
- Is less than or equal to
- Third, enter the answer value of the question set in the column header
- Note: This should exactly match the 'value' specified in your CML and is case sensitive.
- Route a Random Sample - Automatically send "n%" randomly selected rows from one operator to the next based on your desired sample size to streamline and automate various QA processes.
- Route by Column Confidence - Route the column header of interest by the desired confidence threshold to allow the workflow to automate the qualifying of rows.
- This routing option supports any column using agg, cagg_x, or agg_x aggregation.
- If desired, add multiple conditions to the rule to fine-tune how rules are handled.
- Note: you cannot add both 'AND' and 'OR' conditions to the same rule at the same time
- After adding two operators, you will be asked to configure routing rules. You can select any of the following rules below:
Figure 2. Configuring Routing Rules Between Jobs
Figure 3. Example CML With 'values' To Be Used In The Routing Rules
-
- Setting up branching operators
- Create workflow branches to route the output of one operator to several destination operators
- Branching becomes available once you add a second-level operator to our workflow.
- Note: Currently, we do not support the merging of output of branches back together into one operator.
- Setting up branching operators
Figure 4. Workflow Structure With Branching
- 3. Source data
- Uploading data to a workflow is similar to uploading data to a job. All data in a workflow must pass through the first operator, so in a sense, you can think of uploading data to a workflow as uploading data to that first operator
- Filenames should not contain special characters, including apostrophes (!,.*&'# etc.).
- Dataset Requirement
- Should contain the liquid tags referenced in your operators. In the upload modal, we'll display these tags detected from the first operator as a reminder of the data being labelled.
- All column headers should be in lowercase, contain no special characters, including apostrophes (!,.*&'# etc.) except for underscores used to replace and spaces.
- As with jobs, there is a 250k row limit for workflow uploads.
Note: Unlike uploading source data directly to an individual job, the source data column headers will not be automatically validated (down-cased/replacing spaces with underscores) when uploading source data into Workflow and left as is in the source file. Please note, this difference should be accounted for when using liquid to reference columns of source data in CML to ensure data is displayed as expected.
Fig 5. Workflow Data page
- 4. Review the workflow before launch
- On the launch page, we will display a summary of the operators in your workflow and highlight a few important items:
- Price per Judgment and Judgments per Row for each operator
- Estimated Maximum Cost
- This is intended to provide a max contributor cost estimate if all uploaded rows run through all operators
- Available Funds
- On the launch page, we will display a summary of the operators in your workflow and highlight a few important items:
- 5. Launch the workflow
- As with jobs, we recommend testing your workflow with 100 rows before ordering a large number of rows. To do this, select "Order rows and Launch".
- Note: Data operators and routing rules cannot be edited after a workflow is launched
- As with jobs, we recommend testing your workflow with 100 rows before ordering a large number of rows. To do this, select "Order rows and Launch".
- 6. Workflow Reports
- After the initial test run, you'll want to review your workflow report which contains data from all operators based on your filtering rules. Please review your test run to ensure all rows routed correctly before launching the remaining rows in the workflow.
- The workflow report will contain some new columns not found in the job report:
data_line_id
- This identifier will follow the row from the first operator to be finalized workflow report. It is a lot like unit_id and can be used to track results and where a row was routed.
row_ingested_at
- This is the timestamp at which the row was uploaded to the workflow.
j{job_id}:{column_header}:agg
- For every operator in your workflow, you will see a column containing the Job ID and the question being answered. This column will contain the value of the aggregated answer chosen in the operator
j{job_id}:{column_header}:confidence
- This will be the confidence of the aggregated answer.
- For image annotation jobs will also see the following columns:
j{job_id}:{annotation}
- This will contain the aggregated annotation for the row
Fig 6. Example Workflow Report