Guide to: Running a Text Relationships Job – Appen Success Center

Overview

The Text Relationships tool (cml:text_relationships) allows users to create a job that annotates relationships between spans of text with a custom ontology.

Note: As this tool is currently in its Beta phase, please contact your Customer Success Manager to gain early access.

Note: For Dedicated customers, this feature is currently not available On-Premises and only available via the cloud multi-tenant Appen Data Annotation Platform.

Screen_Shot_2020-05-29_at_12.41.29_PM.png

Figure 1: Text Relationships Tool via Preview Page

Glossary

Span - a string of text with an assigned class label; the output of a model or contributor judgment.
Relationship - consists of two spans (from a span and to a span) and the relation between them.
Relation - the name/type of the relation between two spans as defined in the job's ontology.
From span - the starting span in a relationship
To span - the ending span in a relationship

Upload Data

The source data of a Text Relationships job can come from two different sources:

The output of a previous text annotation job on the Appen platform
- The output of a previously ran text annotation job on the Appen platform can be uploaded to a text relationships job directly without any modification.
Data created externally
- A source file containing data created externally can be uploaded to a text relationships job, but an identical JSON format (hosted in a URL and a CORS configured bucket) as the output of a text annotation job on the Appen platform will be required.

Note: There is an example file attached below on how to format source data.

Build a Job

Parameters

Below are the parameters available for the text relationships tool. Some are required in the element, some can be left out.

source-data (required)
- The name of the source data column containing the data to be annotated
name (required)
- The result header where the result links will be stored
context-column(optional)
- The name of the source data column containing the context of each data row
review-data (optional)
- The name of the source data column containing pre-annotated text relationships
task-type (optional)
- Please set task-type=”qa” when designing a review or QA job. This parameter needs to be used in conjunction with review-data . See this article for more details.
direction (optional):
- Renders text in a specific direction
- Accepts rtl and ltr for right-to-left and left-to-right scripts respectively
- Defaults to left-to-right if not set

Ontology

The Ontology Manager allows job owners to create and edit the ontology within a Text Relationships job. Text Relationships Jobs require an ontology to launch.
When the CML for a text relationships job is saved, the Ontology Manager link will appear at the top right of the Design page.
The ontology of Text Relationships jobs allows you to create relationship restrictions to each span class.
The ontology of a Text Relationships job can be copied from a Text Annotation job or another Text Relationships job, via download and upload.

2020-06-09_08.58.56.gif

Figure 2: Ontology Manager for Text Relationships

Ontology Manager Best Practices

The limit of ontology is 1,000 classes, however, as best practice, we recommend not exceeding 16 classes in a job to ensure contributors can understand and process the different classes.
Choose from 16 colors pre-selected or upload custom colors as hex code via the CSV ontology upload.

Important Note: If there is no relationship restriction defined to a class, the class will not able to relate to any other classes in the job.

Nested Spans

Text Relationships tool (cml:text_relationships) supports nested spans. However, there is one restriction for creating relationships for nested spans.

Consider this example:

Here we have one root span

and two sub spans.

We can create Relationship between sub spans

but between root span and sub span we cannot create a relationship.

Results

Results are links to a JSON file that contains a list of relationships.
The links are found in the Full or Aggregated report under the column header that was specified as the value for the name attribute.
Result links will expire 15 days after generation; to access results after the links have expired, you will need to re-generate the result report.
Each relationship instance is an array of five attributes:
- id: the unique ID of each relationship instance
- name: the class name of the relation
- from_span: contains the details of from_span
- to_span: contains the details of to_span
- annotated_by: indicates whether a relationship instance is pre-loaded into the job or manually added by an annotator

Downloadable Files: