Follow

Guide to: Running a Smart Text Collection Job

Overview

Smart Text includes a number of features to ensure high quality, customized and original data, including disabling copy/paste, minimum and maximum word counts, robust spelling and grammar checks and now Rich Text. Smart Text will smooth the contributors' writing experience and ensure high-quality output, especially for jobs related to LLMs, such as creative writing prompt/response pairs and response improvement.

In addition to disable pasting, outlined below, Smart Text is also compatible with the Basic Validators such as word and character counts, described in this article and the Smart Validators, such as regex and spelling & grammar, described in this article.

Smart Text autosaves every ten seconds, ensuring nothing is lost if contributors leave their task or encounter a crash.

Job design

From the side bar choose "Smart Text".

bffbc459-d2b3-476c-90d3-dba25b911928.png

 

 

Disable Pasting

Once you have chosen Smart Text you will see a checkbox "Disable Pasting". When pasting is disabled (disable-pasting="true"), contributors will not be able to paste information in the input text box, regardless of the origin of the information (another judgment, another document on their desktop, from their browser…). Copy/paste is disabled for right click, hotkeys, and keyboard shortcuts.

 

 

 

Rich Text Editor

 You are now able to design jobs using a Rich Text Editor (RTE). Using our RTE will enable your contributors to format their input text with the following:

  • Tables

  • Code blocks with syntax highlighting for HTML, SQL, Java, Javascript, and more

  • Math/science equation formatting using syntax for LaTeX
  • Bold text

  • Underlined text

  • Italicized text

  • Bulleted lists

  • Numbered lists

When using Smart Text, Rich Text is enabled by default. Disable Rich Text by unticking the checkbox in the graphical editor. You can also edit the default cml attribute to rich="false".

Note: The "Undo" capability is currently only supported when rich="true" is enabled. To undo, contributors can click the back arrow button or command+z on their keyboard.

Parameters

  • rich="true" (optional, defaults to "true"):
    • this will include rich text in your smart text
  • review-data="{{review_data_column}}" and task-type="qa" (optional):
    • This parameter enables the loading of an annotation within the smart text tool. When the contributor loads the judgment, they will see a pre-annotation in the tool and have the option to make changes before submitting.

    • For the smart text tool, review-data must reference data in specific formats. Supported formats include:

      • Plain text within the dataset column
      • .txt files
      • .html files
      • A CDS reference pointing to plain text or one of the above file formats
  • equation="true" (optional, defaults to "false")
    • This parameter allows contributors to type in LaTeX syntax using a dollar sign ($) as a wrapper, which will render the LaTeX automatically within the input box of the tool. Contributors can also copy and paste content correctly into the text box, with the content rendering automatically.

    • Note: You can also include a column in your output that translates everything in the smart text box to LaTeX syntax. Refer to the raw-output parameter for more information.

  • raw-output="true" (optional, defaults to "true"):
    • includes the following extra columns in the output, along with raw text
      • HTML
      • Markdown
    • if raw-output="false", the output will only include raw text
  • model-annotation="CML_MODEL_NAME" (optional):
    • This parameter allows you to present a model response within the cml:smart_text element, learn more in this article
  • read-mode="true" (optional, defaults to "false"):
    • When enabled, contributors will not be able to edit the content within the text box. This mode is intended for presenting information to contributors using the review-data parameter.

    • By default, read-modeis set to false 

Rich Text Output Format

{
ableToAnnotate: <boolean>,
annotations: {
text: "...",

rawContent: "...",

contentType: "html"

},

metadata: { ... }
}

When using the results report, you will also be able to visualize the raw text without html markup for readability. In Quality Flow, any input text formatted with the Rich Text Editor will be displayed in subsequent jobs as formatted by the initial contributor. The reviewer will be able to modify the formatting as needed to improve the output quality.

 

Job Report

Refer to this article for information on Annotation Tools Job Reports.


Was this article helpful?
6 out of 7 found this helpful


Have more questions? Submit a request
Powered by Zendesk