Guide to: Running a Smart Text Collection Job – Appen Success Center

Overview

Smart Text includes a number of features to ensure high quality, customized and original data, including disabling copy/paste, minimum and maximum word counts, robust spelling and grammar checks, rich text and detecting AI content. Smart Text will smooth the contributors' writing experience and ensure high-quality output, especially for jobs related to LLMs, such as creative writing prompt/response pairs and response improvement.

In addition to disable pasting, outlined below, Smart Text is also compatible with the Basic Validators such as word and character counts, described in this article and the Smart Validators, such as regex and spelling & grammar, described in this article.

Smart Text autosaves every ten seconds, ensuring nothing is lost if contributors leave their task or encounter a crash.

Job design

From the side bar choose "Smart Text".

Disable Pasting

Once you have chosen Smart Text you will see a checkbox "Disable Pasting". When pasting is disabled (disable-pasting="true"), contributors will not be able to paste information in the input text box, regardless of the origin of the information (another judgment, another document on their desktop, from their browser…). Copy/paste is disabled for right click, hotkeys, and keyboard shortcuts.

Rich Text Editor

You are now able to design jobs using a Rich Text Editor (RTE). Using our RTE will enable your contributors to format their input text with the following:

Tables
Code blocks with syntax highlighting for HTML, SQL, Java, Javascript, and more
Math/science equation formatting using syntax for LaTeX
Bold text
Underlined text
Italicized text
Bulleted lists
Numbered lists

When using Smart Text, Rich Text is enabled by default. Disable Rich Text by unticking the checkbox in the graphical editor. You can also edit the default cml attribute to rich="false".

Note: The "Undo" capability is currently only supported when rich="true" is enabled. To undo, contributors can click the back arrow button or command+z on their keyboard.

⚠️ Note: We have a known limitation - the AI Chat Feedback Tool should NOT be configured and used in the same job design as a Smart Text element that utilizes review-data. This will cause issues in the AI Chat Feedback Tool's prompt input box.

Parameters

rich="true" (optional, defaults to "true"):
- this will include rich text in your smart text
review-data="{{review_data_column}}" and task-type="qa" (optional):
- This parameter enables the loading of an annotation within the smart text tool. When the contributor loads the judgment, they will see a pre-annotation in the tool and have the option to make changes before submitting.
- For the smart text tool, review-data must reference data in specific formats. Supported formats include:
  - Raw text, HTML, or Markdown within the dataset column
  - .txt files
  - .html files
  - .md files
  - A CDS reference pointing to plain text or one of the above file formats
equation="true" (optional, defaults to "false")
- This parameter allows contributors to type in LaTeX syntax using a dollar sign ($) as a wrapper, which will render the LaTeX automatically within the input box of the tool. Contributors can also copy and paste content correctly into the text box, with the content rendering automatically.
- Note: to include a column in your output that translates everything in the smart text box to LaTeX syntax, refer to the raw-output parameter below.
raw-output="true" (optional, defaults to "false"):
- if raw-output="true", the output will include HTML, Markdown and LaTeX columns along with raw output.
- if raw-output="false", the output will only include raw text
raw-output="['HTML', 'Markdown', 'Latex']":
- use one or multiple options to configure only certain columns to be included in the output
- Note: for LaTeX to output make sure equation="true"
model-annotation="CML_MODEL_NAME" (optional):
- This parameter allows you to present a model response within the cml:smart_text element, learn more in this article
read-mode="true" (optional, defaults to "false"):
- When enabled, contributors will not be able to edit the content within the text box. This mode is intended for presenting information to contributors using the review-data parameter.
- By default, read-modeis set to false

Rich Text Output Format

{
  ableToAnnotate: <boolean>,
  annotations: {  
   text: "...",  
   rawContent: "...",  
   contentType: "html"  
},  
 metadata: { ... } 
}

When using the results report, you will also be able to visualize the raw text without html markup for readability. In Quality Flow, any input text formatted with the Rich Text Editor will be displayed in subsequent jobs as formatted by the initial contributor. The reviewer will be able to modify the formatting as needed to improve the output quality.

Job Report

Refer to this article for information on Annotation Tools Job Reports.

AI Detector

AI Detector gathers behavioral signals from your contributors and computes an AI Detection score. The AI Detector works on jobs containing one smart text box (and no other questions). We do not recommend to use AI Detector when you are collecting texts of fewer than 150 words.

To enable the AI Detector in your job, switch to the code editor and add <cml:ai_detector/>, to your job's design.

How it works

When you enable the AI Detector, behavioral data is continuously gathered from your contributors as they complete their task. Our proprietary model identifies units that have a 92% chance of being AI-generated and once three such units are found submitted by the same contributor we know with 99.9% accuracy that one of these three units is AI generated. The contributor is considered positive to AI Detector and suspicious units are available to be downloaded in a report, see below.

If a contributor only shows two units that might have used AI, they will not be considered as positive to AI Detector. If they submit more units and within this new batch of units AI Detector spots a third unit that may have used AI, then the contributor will hit the AI threshold and be considered positive to AI Detector.

Scores are computed every two hours (UTC), on the even hour, after the job is launched.

Reports

Two reports can be found in the RESULTS tab at the job level:

1. AI Detector Report - Contributor Level: will list all the contributors that have been considered positive to AI Dectector, along with their number of units submitted and flagged by AI Detector

2. AI Detector Report - Unit Level: will list all the units considered positive to AI Detector

Note:

As signals are continually processed, and computed every two hours, your job's AI Detection scores will change while the job is running.

ai reports.png

Report details

AI Detector Report - Contributor Level

This report is a breakdown of the job's submissions by contributor. For each contributor (Column B), the report list the total units worked (Column D), the number of units which our model predicts may have used AI (Column C) and finally, the number of units we predict as AI generated with 99.9% accuracy (Column E).

Screenshot 2024-12-10 at 8.51.06 PM.png

AI Detector Report - Unit Level

This report is a breakdown of all units submitted in the job. When the value in predicted LLM (Column E) is 1, then the probability is that this unit has been generated using AI is at least 92%. When the value is 0, then the probability is not high enough to infer that the contributor has used AI on this unit.

Screenshot 2024-12-10 at 8.55.15 PM.png

The PROMPT_ID (Column D) corresponds to the UnitID column in your full DATASET report and can be further mapped to the UnitDisplayID in the Quality Flow DATASET table. You can review the data in your report, or search for the PROMPT_ID in the dataset page and click on the unit view.

Screenshot 2024-12-10 at 9.20.10 PM.png

Limitations

When you enable the AI Detector in your Smart Text job in Quality Flow, make sure to:

configure only one row per page in the job settings, you will also be prompted to adhere to this when saving your job design

Screenshot 2024-12-10 at 9.24.00 PM.png

use only one tool per page in the job design, combining an additional tool to your smart_text box will create noise in the signal collection, leading to decreased accuracy in the AI Detector
do not disable copy/paste, as this would also impact the accuracy of the detection
enable AI Detector prior to running the job, it is not possible to enable AI Detector after a job has been running.