Guide to: Running an AI chat feedback job – Appen Success Center

Overview

AI Chat Feedback jobs allow you to monitor and evaluate one or more of your LLMs through live conversations with domain experts. AI Chat Feedback enables contributors to interact with, and give feedback on multiple model outputs, based off real-time responses.

Note:

The AI Chat Feedback tool is only available in Quality Flow. To set up an AI Chat Feedback job, you will first need to have Quality Flow and LLM enabled on your account and be able to configure models. More information can be found in this article.

Building a Job

Once you have you Quality Flow project created and models available you can set up your job using the code editor or the graphical editor.

The Code Editor

The CML tag for this tool is <cml:ai_chat_feedback/>, as in the simple example below. Note that source_data is the only obligatory parameter. Details for the use of further parameters in the code editor will be given below, after the graphical editor instructions.

<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="{{source_data}}" validates="required" />

The Graphical Editor

This section outlines the steps for enabling an AI Chat feedback job using the graphical editor.

Step 1: click on the AI Chat Feedback icon in the graphical editor:

Screenshot 2023-11-17 at 2.09.15 PM.png

Step 2: Select the column from your source data that contains the IDs assigned to your model(s). The source data is an obligatory parameter for this tool; you will need to upload your data and provide the the source column before you can continue designing your job.

Step 3: Configure additional settings such as rich text, enhanced responses, model response rewrite and/or disable pasting (see below under CML parameters for more information on these options) and click Save & Close.

Screenshot 2024-01-26 at 1.53.42 PM.png

Step 4: Configure any smart validators, regex and/or spelling & grammar (see this article for more on Smart Validators)

Step 5: Click on "Manage Language Models" to enable the models in your Job.

Screenshot 2023-11-17 at 2.16.01 PM.png

Step 6: Select the models you'll be using in your Job and click "Save".

Screenshot 2023-11-17 at 2.16.09 PM.png

Parameters

This tag supports several parameters in standard single-model jobs; source-data is the only required parameter. There are additional optional parameters for advanced use cases, described below.

source-data (required)
- This is the Model ID that was assigned to the model when you configured it for your Team (see this article), or the column name from your source data that contains the model ID. This can be an array of Model IDs if multiple models are being used: e.g. MODEL_ID or [MODEL_ID, MODEL_ID].

preamble (optional)
- This is information that your model(s) will take in as further information or context related to the conversation going forward. Contributors will not be able to give feedback or answer any ontology questions directly related to the preamble.
- For the preamble to function, your model needs to have an endpoint capable of ingesting a preamble/context/more information. If your model supports this input type, navigate to the Models configuration in your Team Account and incorporate something like the following into your input schema (see this article), replacing "preamble_override" with the model-specific parameter expected by the model endpoint.

"preamble_override": "${dynamic_attrs.preamble}"

live-preamble (optional)
- This parameter allows contributors to create their own preamble or context (contributors will not be able to give feedback or answer ontology questions related to the live-preamble). When a contributor loads a task, they will see a pop-up prompting them to supply context or preamble for the model. This context is fed to the model each time a message is sent to it.
- Syntax: live-preamble='[true]' , as in the screenshot below, where the default instruction for contributors is "Please provide the model with the necessary context for the conversation." OR live-preamble="[true,'Your text here']" , where you can customize your instruction to the contributors within single quotes.

review-data (optional)
- This is the column name from your source data that contains any existing chat history output by the tool. Refer to the "Results" section to see the format.
- You can add review-data by utilizing the column selector in the graphical editor, or the cml attribute.

- - The purpose of review-data is to provide contributors (and the model) with any previous turns so that the contributor and model can carry on the interaction. This is an example of what a job with review-data looks like:

seed (optional)
- This is the column name from your source data that contains a pre-written prompt. When the job is launched, the contributor will be able to see a pre-written message in the box where they would normally type their prompt. They will not be able to edit this message, they will only be able to send it and then proceed to evaluating the response(s).

The next image shows an example where review-data and seed parameters have been used. The greyed-out USER/RESPONSE text reflects the review-data, while the pre-written prompt in the input field "What should I pack?" reflects the seed data.

min-turns (optional, defaults to 0)
- max-turns (optional)
  - This attribute allows you to control the maximum number of turns a contributor must complete before submitting their judgment. Once the contributor meets the specified number of turns, they will be prompted to submit their work (as in the screenshot below) and any further submission will be blocked.
  - Syntax: max-turns="5"
custom-response (optional, defaults to "false")
- Enables contributors to provide their own response to the prompt in cases where the model response(s) are not optimal. This is so that contributors do not have to accept mediocre or subpar replies.
- You still have the ability to gather ontology responses and contributor preferences for the model response(s), but the custom response is the one that will be used to continue the interaction.
- Enable through cml, or through the graphical editor, by checking "Model Response Rewrite" under Custom Configurations.

Screenshot 2024-01-26 at 2.14.59 PM.png

rich (optional, defaults to "false" )
- This parameter enables contributors to integrate rich text in the main tool input box. custom-response and live-preamble always include rich text, and this attribute allows you the added opportunity to choose the complexity for the main input box in other scenarios.
expanded-data (optional, defaults to "true")
- To disable this feature, set expanded-data="false" in the cml:ai_chat_feedback tag.
- By default or when enabled in the AI Chat Feedback tool, this parameter generates each data field as individual columns in Quality Flow's dataset section, in addition to the standard JSON format.
- Note: This is currently not supported with the num-responses parameter.
- For further details, refer to the "Results" section below.

In QA or review jobs you can optionally also use the following parameters:

allow-continue (optional, defaults to "false")
- In QA jobs, contributors are able to access previous rounds of contributor/chatbot interaction and give feedback on each individual response based on the set ontology. This attribute allows you to control whether or not the QA contributor can also send new messages to the chatbot.
show-feedback-answers (optional, defaults to "true")
- This attribute allows you to control whether or not QA contributors are able to see previous contributor model response selection, or ontology responses.
allow-custom-response-edit (optional, defaults to "false")
- This attribute allows you to edit any of the new completions from a previous conversation.
- If used in combination with allow-continue, the updated custom responses get to the the model again.
- If using these attributes together it is recommended to only make slight style, spelling or grammatical adjustments. Changing the custom response too much may cause confusion in subsequent model responses.

Finally, when using multiple models (see Model Response Selection, below) there are two additional parameters:

enhanced (optional, defaults to "false")
- In use cases where contributors are asked to choose between responses from multiple models, this attribute allows them to provide additional detail about their choice.
- Enable through the cml or through the graphical editor, by checking "Enhanced Responses" under Custom Configurations.
num-response="n" (optional, defaults to total number of configured models)
- In cases where you wish to evaluate many models but only want to present some of the model responses in a given round, you can use this attribute to specify how many responses to collect and present.
- In each round of the conversation, the tool randomly chooses "n" models out of all of those you have configured, and presents them to the contributor.
- Only the responses randomly selected in each round will appear in the output.
- For example, if the following cml is used, the tool will present two responses from 87, 88 or 89 per round.

<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="[87,88,89]" num-responses="2" validates="required" />

Response Level Feedback

If you would like contributors to provide feedback on individual model responses, you can configure ontology questions.

Step 1: Go to "Manage AI Chat Feedback Ontology".

Step 2: Configure your ontology question(s).

Once an ontology is configured, the tool will require contributors to provide feedback for each response.

Contributors will not be able to submit their judgment until all required questions have been answered.

Model Response Selection

If you have set up multiple models within the tool, contributors will be presented with responses from all the configured models and they must choose which response they wish to continue the conversation with (after answering any ontology questions for each response), by selecting the radio button located to the left of the response.

If the responses are very similar and the contributor can't choose between them they can select the "Responses are near identical" button. The tool will randomly choose a response for them to continue the conversation with.

Once the contributors have chosen the response they prefer, if enhanced="true" they will then be asked an additional question about how the responses compare.

<cml:ai_chat_feedback label="Provide chat feedback" name="annotation" source-data="{{source_data}}" enhanced="true" validates="required" />

MicrosoftTeams-image (10).png

Upload Data

Upload data into the job as a CSV where each row represents a conversation to be collected. Your .csv must contain at least one column containing the Model ID.

Results

1. The results from an <cml:ai_chat_feedback/> includes the entire chat history, both model and user messages, model ID used, and any answers provided by the user, if questions were configured, AND the model response selection, if multiple models were configured. If enhanced="true" there will also be a "confidence" line. For example:

{
  "ableToAnnotate": true,
  "annotation": {
    "chat": {
     "0": {
       "id": "0",
       "prompt": "hi there",
       "completion": [{
       "modelId": 18,
       "completion": "Hello! How can I assist you today?",
       "selection": true
       "confidence": "much-better"
     }, {
       "modelId": 19,
       "completion": "Hello there! How can I assist you today?",
       "selection": false
    }],
    "feedback": {
      "18": {
          "questionId": "d8ba7201-ee5b-4ef8-8e5c-894711b48e2b",
          "type": "Multiple Choice",
          "name": "is_this_response_factually_correct",
          "answer": {"values": "yes"}
       }]
    },
     "19": {
         "questionId": "d8ba7201-ee5b-4ef8-8e5c-894711b48e2b",
         "type": "Multiple Choice",
         "name": "is_this_response_factually_correct",
         "answer": {"values": "no"}
       }]
      }
     },
     "enableFeedback": true
    }
   "model": "[19,18]"
  }
}

2. By default or when expanded-data="true" is enabled in the cml tag, along with the above JSON, you will also be able to see the data expanded into individual columns of your Quality Flow dataset UI. You can find the broken-out columns in the "Latest Judgment" section of your columns filter: gif - expanded data.gif

Each JSON field is represented as a separate column.

For example, if you configure "Model 111" and "Model 222," there will be a dedicated column for each model’s responses, one for all user prompts, and so on.

Each cell contains an array, whose length corresponds to the number of rounds the contributor submitted in that judgment.

Note: If there are optional selections (such as custom-response), the array will include “NULL” in any element where the contributor opted not to provide a response for that round. Currently, if enhanced or custom-response parameters are disabled, the columns will still be included, and their elements will be populated with 'NULL'.