This product is designed to enable our chatbot and conversational AI customers to scale and grow their text utterance collection use cases. The CML attribute cml:text
and cml:textarea
can be used for use cases and applications such as transcription and translation.
Note: Machine Learning Assisted Text Utterance is an add-on feature. Please reach out to your Customer Success Manager if you’re interested in purchasing access.
Glossary
- Utterance - The piece of text to be collected from the contributor
- Coherence - The piece of text that is logical and consistent
- Prompt - Data provided to the contributors to give them guidance on what utterances to collect
Job Design
Fig 1. Machine Learning Assisted Text Validators
Smart Validators
Language Detector
- This validator is used to ensure contributors are submitting text in the correct language. Learn more about our Language Detector model here.
- The currently supported languages are:
-
English
-
German
-
French
-
Spanish
-
Japanese
-
Portuguese
-
Italian
-
Dutch
- You will need to provide a threshold that will be used to evaluate contributors' submission. The lower the threshold, the more lenient the evaluation will be.
- Note: you may only validate for one target language per field.
Fig 2. Language Detection Validator
Coherence Detector
- This validator is used to ensure contributors are submitting text that is cohesive and coherent. The model auto-detects the language they are typing in and then evaluates the probability that what they’re typing is valid text in that language. Learn more about our Coherence Detector model here.
- These validators work best on text longer than 10 words.
- The currently supported languages are
- English
- German
- French
- Spanish
- Japanese
- Portuguese
- Italian
- Note: You can use this in conjunction with other validators, but you may only set one threshold per individual field.
Fig 3. Coherence Detection Validator
Duplicate Detection
The following validators give you the option of enforcing only unique submissions of text. This is helpful if you need many diverse examples of the same utterance.
- In this job, across all contributors
- If your job is collecting many judgments for per prompt, you'll want to use this option
- In this job, across all contributors, across a unique prompt value
- If your job is collecting utterances for multiple prompts, you can use this validator to specify the column that will be validated on. We will enforce unique utterances for each row, but there may be duplicates that apply to multiple rows.
- In this job, across the unique contributor's submissions
- This validator ensures contributors do not submit duplicate answers within a job. You can use this if you’d like to get a sense for the variance and frequency of utterances.
- In multiple jobs, across all contributors
- This validator allows you to compare data collected for a completed job. You can use this validator if you want to collect additional unique references.
- Note: You must input the job IDs and the cml value that match the previous jobs. Best practice is to copy the job with rows, and order additional judgments without changing the CML.
Fig 4. Duplicate Detection Validator
For more information on the Model, check out this article.