Overview
AI Chat Feedback jobs allow you to monitor and evaluate one or more of your LLMs through live conversations with domain experts. AI Chat Feedback enables contributors to interact with, and give feedback on multiple model outputs, based off real-time responses.
Note:
The AI Chat Feedback tool is only available in Quality Flow. To set up an AI Chat Feedback job, you will first need to have Quality Flow and LLM enabled on your account and be able to configure models. More information can be found in this article.
Building a Job
Once you have you Quality Flow project created and models available you can set up your job using the code editor or the graphical editor.
The Code Editor
The CML tag for this tool is <cml:ai_chat_feedback/>
, as in the simple example below. Note that source_data
is the only obligatory parameter. Details for the use of further parameters in the code editor will be given below, after the graphical editor instructions.
<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="{{source_data}}" validates="required" />
The Graphical Editor
This section outlines the steps for enabling an AI Chat feedback job using the graphical editor.
Step 1: click on the AI Chat Feedback icon in the graphical editor:
Step 2: Select the column from your source data that contains the IDs assigned to your model(s). The source data is an obligatory parameter for this tool; you will need to upload your data and provide the the source column before you can continue designing your job.
Step 3: Configure additional settings such as rich text, enhanced responses, model response rewrite and/or disable pasting (see below under CML parameters for more information on these options) and click Save & Close.
Step 4: Configure any smart validators, regex and/or spelling & grammar (see this article for more on Smart Validators)
Step 5: Click on "Manage Language Models" to enable the models in your Job.
Step 6: Select the models you'll be using in your Job and click "Save".
Parameters
This tag supports several parameters in standard single-model jobs; source-data
is the only required parameter. There are additional optional parameters for advanced use cases, described below.
-
source-data
(required)- This is the Model ID that was assigned to the model when you configured it for your Team (see this article), or the column name from your source data that contains the model ID. This can be an array of Model IDs if multiple models are being used: e.g. MODEL_ID or [MODEL_ID, MODEL_ID].
-
preamble
(optional)-
This is information that your model(s) will take in as further information or context related to the conversation going forward. Contributors will not be able to give feedback or answer any ontology questions directly related to the preamble.
-
For the preamble to function, your model needs to have an endpoint capable of ingesting a preamble/context/more information. If your model supports this input type, navigate to the Models configuration in your Team Account and incorporate something like the following into your input schema (see this article), replacing "preamble_override" with the model-specific parameter expected by the model endpoint.
-
"preamble_override": "${dynamic_attrs.preamble}"
-
live-preamble
(optional)- This parameter allows contributors to create their own preamble or context (contributors will not be able to give feedback or answer ontology questions related to the live-preamble). When a contributor loads a task, they will see a pop-up prompting them to supply context or preamble for the model. This context is fed to the model each time a message is sent to it.
- Syntax:
live-preamble='[true]'
, as in the screenshot below, where the default instruction for contributors is "Please provide the model with the necessary context for the conversation." ORlive-preamble="[true,'Your text here']"
, where you can customize your instruction to the contributors within single quotes.
-
review-data
(optional)- This is the column name from your source data that contains any existing chat history output by the tool. Refer to the "Results" section to see the format.
- You can add review-data by utilizing the column selector in the graphical editor, or the cml attribute.
-
-
-
The purpose of review-data is to provide contributors (and the model) with any previous turns so that the contributor and model can carry on the interaction. This is an example of what a job with review-data looks like:
-
-
-
seed
(optional)- This is the column name from your source data that contains a pre-written prompt. When the job is launched, the contributor will be able to see a pre-written message in the box where they would normally type their prompt. They will not be able to edit this message, they will only be able to send it and then proceed to evaluating the response(s).
The next image shows an example where review-data
and seed
parameters have been used. The greyed-out USER/RESPONSE text reflects the review-data
, while the pre-written prompt in the input field "What should I pack?" reflects the seed
data.
-
min-turns
(optional, defaults to 0)- This attribute allows you to control the minimum number of turns a contributor must complete before submitting their judgment. If the contributor fails to meet the specified number of turns, they will receive an error message (as in the screenshot below) and their submission will be blocked.
- Syntax:
min-turns="3"
-
max-turns
(optional)- This attribute allows you to control the maximum number of turns a contributor must complete before submitting their judgment. Once the contributor meets the specified number of turns, they will be prompted to submit their work (as in the screenshot below) and any further submission will be blocked.
- Syntax:
max-turns="5"
-
custom-response
(optional, defaults to "false")
- Enables contributors to provide their own response to the prompt in cases where the model response(s) are not optimal. This is so that contributors do not have to accept mediocre or subpar replies.
- You still have the ability to gather ontology responses and contributor preferences for the model response(s), but the custom response is the one that will be used to continue the interaction.
- Enable through cml, or through the graphical editor, by checking "Model Response Rewrite" under Custom Configurations.
-
rich
(optional, defaults to "false" )- This parameter enables contributors to integrate rich text in the main tool input box.
custom-response
andlive-preamble
always include rich text, and this attribute allows you the added opportunity to choose the complexity for the main input box in other scenarios.
- This parameter enables contributors to integrate rich text in the main tool input box.
-
expanded-data
(optional, defaults to "true")-
To disable this feature, set
expanded-data="false"
in thecml:ai_chat_feedback
tag. -
By default or when enabled in the AI Chat Feedback tool, this parameter generates each data field as individual columns in Quality Flow's dataset section, in addition to the standard JSON format.
- Note: This is currently not supported with the
num-responses
parameter. - For further details, refer to the "Results" section below.
-
In QA or review jobs you can optionally also use the following parameters:
-
allow-continue
(optional, defaults to "false")- In QA jobs, contributors are able to access previous rounds of contributor/chatbot interaction and give feedback on each individual response based on the set ontology. This attribute allows you to control whether or not the QA contributor can also send new messages to the chatbot.
-
show-feedback-answers
(optional, defaults to "true")- This attribute allows you to control whether or not QA contributors are able to see previous contributor model response selection, or ontology responses.
-
allow-custom-response-edit
(optional, defaults to "false")- This attribute allows you to edit any of the new completions from a previous conversation.
- If used in combination with
allow-continue
, the updated custom responses get to the the model again. - If using these attributes together it is recommended to only make slight style, spelling or grammatical adjustments. Changing the custom response too much may cause confusion in subsequent model responses.
Finally, when using multiple models (see Model Response Selection, below) there are two additional parameters:
-
enhanced
(optional, defaults to "false")- In use cases where contributors are asked to choose between responses from multiple models, this attribute allows them to provide additional detail about their choice.
- Enable through the cml or through the graphical editor, by checking "Enhanced Responses" under Custom Configurations.
-
num-response="n"
(optional, defaults to total number of configured models)
- In cases where you wish to evaluate many models but only want to present some of the model responses in a given round, you can use this attribute to specify how many responses to collect and present.
- In each round of the conversation, the tool randomly chooses "n" models out of all of those you have configured, and presents them to the contributor.
- Only the responses randomly selected in each round will appear in the output.
- For example, if the following cml is used, the tool will present two responses from 87, 88 or 89 per round.
<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="[87,88,89]" num-responses="2" validates="required" />
Response Level Feedback
If you would like contributors to provide feedback on individual model responses, you can configure ontology questions.
Step 1: Go to "Manage AI Chat Feedback Ontology".
Step 2: Configure your ontology question(s).
Once an ontology is configured, the tool will require contributors to provide feedback for each response.
Contributors will not be able to submit their judgment until all required questions have been answered.
Model Response Selection
If you have set up multiple models within the tool, contributors will be presented with responses from all the configured models and they must choose which response they wish to continue the conversation with (after answering any ontology questions for each response), by selecting the radio button located to the left of the response.
If the responses are very similar and the contributor can't choose between them they can select the "Responses are near identical" button. The tool will randomly choose a response for them to continue the conversation with.
Once the contributors have chosen the response they prefer, if enhanced="true"
they will then be asked an additional question about how the responses compare.
<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="{{source_data}}" enhanced="true" validates="required" />
Upload Data
Upload data into the job as a CSV where each row represents a conversation to be collected. Your .csv must contain at least one column containing the Model ID.
Results
1. The results from an <cml:ai_chat_feedback/>
includes the entire chat history, both model and user messages, model ID used, and any answers provided by the user, if questions were configured, AND the model response selection, if multiple models were configured. If enhanced="true"
there will also be a "confidence" line. For example:
{
"ableToAnnotate": true,
"annotation": {
"chat": {
"0": {
"id": "0",
"prompt": "hi there",
"completion": [{
"modelId": 18,
"completion": "Hello! How can I assist you today?",
"selection": true
"confidence": "much-better"
}, {
"modelId": 19,
"completion": "Hello there! How can I assist you today?",
"selection": false
}],
"feedback": {
"18": {
"questionId": "d8ba7201-ee5b-4ef8-8e5c-894711b48e2b",
"type": "Multiple Choice",
"name": "is_this_response_factually_correct",
"answer": {"values": "yes"}
}]
},
"19": {
"questionId": "d8ba7201-ee5b-4ef8-8e5c-894711b48e2b",
"type": "Multiple Choice",
"name": "is_this_response_factually_correct",
"answer": {"values": "no"}
}]
}
},
"enableFeedback": true
}
"model": "[19,18]"
}
}
2. By default or when expanded-data="true"
is enabled in the cml tag, along with the above JSON, you will also be able to see the data expanded into individual columns of your Quality Flow dataset UI. You can find the broken-out columns in the "Latest Judgment" section of your columns filter: