Guide to: Text Annotation Test Questions

Test Questions are rows with known answers and are randomly inserted throughout your job. There are methods for introducing varied levels of leniency and customization on a per test question basis within the Text Annotation tool. Things to note:

  1. Annotate the text according to the rules specified in the job's instructions.
    • If the data is pre-annotated and is wrong, correct the annotations in the same way that contributors should in the task
    • Multiple acceptable classes can be assigned to each token, although each contributor can only provide one class per token
    • Each token can be part of multiple spans
  2. Token threshold can be adjusted on the left-hand side.
    • The default setting will require 100% accuracy if there are between 1 and 4 tokens annotated. With 5+ tokens annotated, the default introduces leniency.
  3. If no annotations are needed, include an option to hide the annotation tool

*Note: The 'Convert finalized rows to Test Questions' feature is not currently available for the text annotation tool.


  • Span - a set of tokens (1 or more) with an assigned class label - the output of a model or a contributor judgment
  • Merge Span - The act of adding one more tokens to a span
                                             Figure 1. Merging the span together
  • Split Span - The act of taking a span made of 2 or more tokens and dividing them back into their own tokens


Figure 2. Splitting the span


Understanding How Contributors Are Evaluated Against Test Questions 


Figure 3. How Contributors are Evaluated


There are two aspects of each test question which the contributors are evaluated against.

  1. Spans (all or nothing)
    1. In order to pass the test question, contributors must correctly merge all tokens that are merged in the spans of the test question answer
    2. If the contributor fails to merge two tokens that are meant to be merged, or merges two tokens that should not be merged, the test question will be missed regardless of the classes assigned to the tokens

  2. Classes (token threshold)
    1. The contributor must correctly annotate the number of tokens specified by the test question’s token threshold
    2. The default setting will require 100% accuracy if there are between 1 and 4 tokens annotated. With 5 or more tokens annotated, we introduce leniency as the default by requiring 75% of the tokens (rounded down) to be correct.
      • Note: You may always customize how many tokens a contributor needs to pass


Figure 4. Token Threshold


Test Question Answers Spans  Tokens

Each Token is part of a single Span


  • Single Span as the words "John Smith Foundation" are merged together


  • Single Span As the Words "John", "Smith" and "Foundation" Are it's own span


  • Each word is a part of 2 different classes, but the words "John", "Smith", and "Foundation" are individual single span

To pass, spans provided by contributors must match 100%


To pass, contributor must correctly assign classes to>= tokens

*X= token Threshold

Tokens are part of >1 span


  • This Test Question contains >1 span as "John Smith" and John Smith Foundation" are categorized as separate spans and classes. 



Spans will not be evaluated 

To pass, contributor must correctly assign classes to >= X tokens


*X= token Threshold



*Note: Only tokens with assigned classes in the test question answer will be compared against the contributor’s answer. If the “none” class is used when creating the test question, then false positives (any class assignment other than “none”) will be considered incorrect.




Was this article helpful?
0 out of 0 found this helpful

Have more questions? Submit a request
Powered by Zendesk