Machine Learning Assisted Text Utterance Collection - Code Editor

This product is designed to enable our chatbot and conversational AI customers to scale and grow their text utterance collection use cases.  The CML attribute cml:text and cml:textarea can be used for use cases and applications such as transcription and translation.

Note:  Machine Learning Assisted Text Utterance is an add-on feature. Please reach out to your Customer Success Manager if you’re interested in purchasing access.



  • Utterance - The piece of text to be collected from the contributor 
  • Coherence - The piece of text that is logical and consistent 
  • Prompt - Data provided to the contributors to give them guidance on what utterances to collect


Job Design 


Fig 1. Example of a fully incorporated validator

Basic Validators

Word count validator

wordCountMin:{number} and wordCountMax:{number}

  • This validator is used to check that contributors are submitting the right amount of words. The word count validator allows you validate that contributors are submitting a minimum and/or maximum number of words. A word is defined as any text delimited by whitespace.
  • Note This can be used with any other validator, or use them separately


Smart Validators 

Language Detector 

required lang:['{language}', {threshold_number}]

  • This validator is used to ensure contributors are submitting text in the correct language. Learn more about our Language Detector model here.
  • The currently supported languages are:
    • English - en
    • German - de
    • French - fr
    • Spanish - es
    • Italian - it
    • Portuguese - pt
    • Japanese - ja
  • You will need to provide a threshold that will be used to evaluate contributors' submission. The lower the threshold, the more lenient the evaluation will be. 
  • Note: you may only validate for one target language per field.  


Coherence Detector 

required coherence:[{threshold_number}]

  • This validator is used to ensure contributors are submitting text that is cohesive and coherent. The model auto-detects the language they are typing in and then evaluates the probability that what they’re typing is valid text in that language.  Learn more about our Coherence Detector model here.
  • This validator works best on text longer than 10 words. 
  • The currently supported languages are
    • English
    • German
    • French
    • Spanish
    • Japanese
    • Portuguese 
    • Italian
  • Note: You can use this in conjunction with other validators, but you may only set one threshold per individual field.

Duplicate Detection

The following validators give you the option of enforcing only unique submissions of text. This is helpful if you need many diverse examples of the same utterance.

  • In this job, across all contributors
    • validates = "required unique:['within_job']"
      • If your job is collecting many judgments per prompt, you'll want to use this option.
      • Please note, if you would like to check for unique submissions from within the currently running job and a list of other jobs, please use the validator across multiple jobs ("unique:[{job_id}, {job_id}]") listed below and add the currently running job to the list of job ids.
  • In multiple jobs, across all contributors
    • validates = "required unique:[{job_id}, {job_id}]" 
      • This validator allows you to compare data collected for a completed job. You can use this validator if you want to collect additional unique references.
      • Note: You must input the job IDs and the cml value that match the previous jobs. Best practice is to copy the job with rows and order additional judgments without changing the CML. 
  • In this job, across all contributors, across a unique prompt value
    • validates = "required unique:['within_column_name']" 
      • If your job is collecting utterances for multiple prompts, you can use this validator to specify the column that will be validated on. We will enforce unique utterances for each row, but there may be duplicates that apply to multiple rows.
  • In this job, across the unique contributions submissions
    • validates = "required unique:['within_contributor']" 
    • This validator ensures contributors do not submit duplicate answers within a job. You can use this if you’d like to get a sense for the variance and frequency of utterances.

For more information on the Model, check out this article.

Was this article helpful?
3 out of 7 found this helpful

Have more questions? Submit a request
Powered by Zendesk