Scale and grow your text utterance collection, AI chat feedback and Audio Transcription use cases by including smart validators that automatically detect, flag and reject invalid utterances before they can be submitted.
-
Spelling & Grammar and Regex Validation are available for
cml:smart_text
cml:ai_chat_feedback
andcml:audio_transcription
- Language detection, Gibberish detection and Duplicate detection are available for
cml:smart_text
cml:text
andcml:textarea
Note
Smart Validators can be enabled on your Team account. Please reach out to your Customer Success Manager if you are interested in turning this feature on.
Glossary
- Utterance - The piece of text to be collected from the contributor
- Gibberish - The piece of text that is illogical and inconsistent
- Prompt - Data provided to the contributors to give them guidance on what utterances to collect
Smart Validators
Smart validators can be accessed via the graphical editor, or specified in the cml via the code editor. Smart validators can also be used in conjunction with basic validators such as Word count (see this article and this article).
Example Job design - Code Editor:
<cml:text label="Sample text field:" validates="required unique:['within_job', 0.8] lang:['en', 0.10] gibberish:[0.10] wordCountMin:8" />
Spelling and Grammar detection
validates="required smart_spelling_and_grammar:['en', 'xx-variety']"
When using Smart Text, Audio Transcription, or AI Chat Feedback in English, you can enable a grammar and spelling check for the input text. Our grammar and spelling check will catch issues with spelling, punctuation, agreement and conjugations.
To add these checks to your job:
- Expand "Smart Validation"
- Under Add Formatting Rule, choose "Spelling & Grammar Detection"
Select the language and the locale you want to target. Click Save and the grammar and spelling validation will be applied to the job. Currently the varieties of English shown in the screenshot below are supported.
- Choosing "Don't suggest any changes based on English variety" will identify only those errors that would be errors in any of the supported varieties of English.
- Spelling and grammar issues will be underlined in red as the contributor types.
- By clicking on the underline, they will receive a suggestion to fix the issue.
- To accept a suggestion, contributors will click the corrected form of the word. If contributors do not find the suggestions correct, they can click the trash icon to reject the suggestion and either leave the text as is, or make any manual corrections needed before going on to click "Submit" again.
-
enforced
(optional, defaults to "false")- By default contributors are not required to click on the underlines; i.e. they will be able to submit their response without viewing, fixing or trashing the suggestions. This to avoid the suggestions inadvertently slowing down the throughput.
- Where corrections are of very high priority, you can enforce inspection of the suggestions by using the parameter
enforced
in the code editor. - When
enforced="true"
contributors will receive the following message upon submission, until they have clicked on each underline and accept or reject the suggestion.
Regex Detector
smart_regex:[['regex','error_description','fix_suggestion']]
- Regex detection allows you to validate contributor input against any regular expression allowed in JavaScript (in
cml:smart_text
cml:ai_chat_feedback
orcml:audio_transcription
jobs). - When a contributor enters the specified expression anywhere in the input, the regex will be flagged.
- To add this validator to your job:
- Expand "Smart Validation"
- Under Add Formatting Rule, choose "Regex Detection"
- Enter the regular expression the tool should be detecting
- Enter the error description that explains why the regular expression was flagged
-
- Optionally tick the "Enable fix suggestion" box to also suggest a correction for the error
-
- Test the regex before applying it to your job by using the "Test the rule" input box. If your inputted text doesn't match your regex, the message "Input does not match Regex" will be displayed
- For the above example, the contributor will see a red underline where the regex matches.
- When they click on the line they will see the message "fix this" with a trash icon. Clicking on the trash icon allows them to ignore the flag.
- If a fix suggestion has been set, they will also see the fix suggestion:
- It is possible to have multiple regex detection patterns and corrections running in the same job.
Language Detector
required lang:['{language}', {threshold_number}]
- This validator is used to ensure contributors are submitting text in the correct language. Learn more about our Language Detector model in this article.
- The currently supported languages are:
-
English
en
-
German
de
-
French
fr
-
Spanish
es
-
Japanese
ja
-
Portuguese
pt
-
Italian
it
- You will need to provide a threshold that will be used to evaluate contributors' submission. The lower the threshold, the more lenient the evaluation will be.
- Note: you may only validate for one target language per field.
Fig 2. Language Detection Validator
Gibberish Detector
required gibberish:[{threshold_number}]
- This validator is used to ensure contributors are submitting text that is cohesive and coherent. The model auto-detects the language they are typing in and then evaluates the probability that what they’re typing is valid text in that language. Learn more about our Gibberish Detector model in this article.
- These validators work best on text longer than 10 words.
- The currently supported languages are
- English
en
- German
de
- French
fr
- Spanish
es
- Japanese
ja
- Portuguese
pt
- Italian
it
- Note: You can use this in conjunction with other validators, but you may only set one threshold per individual field.
Fig 3. Gibberish Detection Validator
Duplicate Detection
The following validators give you the option of enforcing only unique submissions of text. This is helpful if you need many diverse examples of responses to the same prompt. Note that two of these validators also include a threshold
parameter, see below for more detail.
Fig 4. Duplicate Detection Validator
-
In this job, across all contributors
validates = "required unique:['within_job', {threshold_number}]"
- If your job is collecting multiple utterances per prompt, and you only want unique utterances, across the whole job, you'll want to use this option
-
In this job, across all contributors, across a unique prompt value
validates = "required unique:['within_column_name']"
- If your job is collecting utterances for multiple prompts, and you want to ensure that you are getting unique responses for each prompt, you can use this validator to specify the column that will be validated on. This enforces unique utterances for each prompt, but there may be duplicates utterances across the whole dataset.
-
In this job, across the unique contributor's submissions
validates = "required unique:['within_contributor', {threshold_number}]"
- If your job is collecting multiple utterances and you are interested in the range of likely utterances, or would like to understand the frequency of certain utterances across different contributors, you will use this setting. This ensures each contributors submissions are unique but there may be duplicate utterances across contributors.
-
In multiple jobs, across all contributors
validates = "required unique:[{job_id}, {job_id}]"
- This validator allows you to compare against data collected for a completed job or jobs. You can use this validator if you want to collect additional unique utterances.
In the graphical editor, you will be asked to enter the job id(s) to compare against, separated by commas. All jobs to be compared should have identical cml.
Fig 5. Duplicate Detection Validator: Enter Job IDs
Duplicate Detection Threshold
An additional duplicate detection setting is available in Text Collection jobs that are running in Quality Flow (to find out more about Quality Flow see this article and this article). This setting will allow you to specify a threshold at which utterances should be considered duplicates.
Learn more about the threshold settings in this article.
Duplicate detection threshold is available in Quality Flow work jobs for the following settings:
- In this job, across all contributors
- In this job, across the unique contributor's submissions
Fig 6. Duplicate threshold
The first submitted utterance will serve as the baseline for comparison to subsequent submissions. Upon submission any utterance that is considered a duplicate due to meeting or exceeding the specified threshold will not be accepted and the contributor will receive an error message as in the screenshot below.
Fig 7. Contributor view: duplicate detection threshold set to 0.9