`queryMessages` field added & query generation optimization #653

Vegoo89 · 2023-09-20T07:58:23Z

Closes #641

Purpose

Added option to use queryMessages to generate optimized search query
Option is enabled by default and can't be switched off in the Settings. When switched off, standard history will be used as before

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Test the code

build the frontend & run unit tests
unit tests have been updated and can be ran as usual

What to Check

Verify that the following are valid

queryMessages field should be present in request and response json body
new elements should be added to this field as conversation flows
on chat clear, this field value should be reset

Other Information

…nerationFix

Vegoo89 · 2023-09-20T08:24:38Z

@microsoft-github-policy-service agree

Vegoo89 · 2023-09-27T10:24:55Z

@pamelafox
Can I ask for review of this functionality? Thanks!

pamelafox · 2023-09-28T20:23:02Z

app/backend/approaches/chatreadretrieveread.py


 # STEP 1: Generate an optimized keyword search query based on the chat history and the last question
 messages = self.get_messages_from_history(
 self.query_prompt_template,
 self.chatgpt_model,
- history,
+ query_history_input,


Our prompt says "Below is a history of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base about employee healthcare plans and the employee handbook."
I'm surprised you got good results by passing in the query history since it would seem to be in disagreement with the prompt. You didn't need to alter the prompt at all?

Hmm, yea our prompt is different, customizable for every use case. But is it in a disagreement with the this prompt? I don't think so, since the structure contains history but only in form of query messages and bot generated queries. This is basically list of few shots.

pamelafox · 2023-09-28T20:23:48Z

Could you give examples of query generations that worked before after this change? I'm looking into add evaluation metrics to this repository so we can measure changes like this, but it's difficult to evaluate without good test data.

Vegoo89 · 2023-09-29T12:48:25Z

Could you give examples of query generations that worked before after this change? I'm looking into add evaluation metrics to this repository so we can measure changes like this, but it's difficult to evaluate without good test data.

Most test queries in our benchmarks are more stable after this change was implemented, but we have a bit different use cases, where different teams can alter queries for their needs.

We perform single queries and conversation tests with OpenAI rating / similarity evaluation.

I don't know which examples you would like, but I can't paste you mine due to internal company policies (these are internal data sets).

pamelafox · 2023-10-03T16:25:16Z

Could you give examples of query generations that worked before after this change? I'm looking into add evaluation metrics to this repository so we can measure changes like this, but it's difficult to evaluate without good test data.

Most test queries in our benchmarks are more stable after this change was implemented, but we have a bit different use cases, where different teams can alter queries for their needs.

We perform single queries and conversation tests with OpenAI rating / similarity evaluation.

I don't know which examples you would like, but I can't paste you mine due to internal company policies (these are internal data sets).

Okay, thanks for the additional information! I think your code change looks good, but I want to evaluate it using a new evaluation pipeline I'm working on in another branch. I'll add multi-turn evaluation to it soon which will enable me to test out this change. Sorry for the delay, but this is a great opportunity to try that out.

Also, if you can share anything about how you run evaluations, would love to hear more, as we're trying to figure out good developer flows for evaluation locally and in CI/CD.

Vegoo89 · 2023-10-04T20:12:14Z

Could you give examples of query generations that worked before after this change? I'm looking into add evaluation metrics to this repository so we can measure changes like this, but it's difficult to evaluate without good test data.

Most test queries in our benchmarks are more stable after this change was implemented, but we have a bit different use cases, where different teams can alter queries for their needs.
We perform single queries and conversation tests with OpenAI rating / similarity evaluation.
I don't know which examples you would like, but I can't paste you mine due to internal company policies (these are internal data sets).

Okay, thanks for the additional information! I think your code change looks good, but I want to evaluate it using a new evaluation pipeline I'm working on in another branch. I'll add multi-turn evaluation to it soon which will enable me to test out this change. Sorry for the delay, but this is a great opportunity to try that out.

Also, if you can share anything about how you run evaluations, would love to hear more, as we're trying to figure out good developer flows for evaluation locally and in CI/CD.

Sure, I will wait till you try to run it - then you can ping me and I can rebase/merge latest changes to my branch so there are no conflicts.

About evaluation, there are many possibilities, but GPT is pretty good at such tasks so you can perform standard evaluation by e.g calculating embeddings of the ground truth and bot answer and then comparing them with basic similarity metric (cosine, euclidian) AND you can leverage GPT model and ask him to compare ground truth vs bot answer on the scale of your choosing (just add some few shots so it knows what to do). Pretty sure you got similar ideas in mind already so you can use few scores and blend them to get 'final one' or just use a single metric.

For us, stability of the solution is the most important thing. We all know that when you screw up even one single part of the conversation, it is still kept in history and may break things later on, so in our tests we focus mostly on it.

Also I want to mention that our tests are nowhere near perfect or complete. We are still evolving them and adjust to our needs, so I am also waiting to see your approach :)

github-actions · 2023-12-04T01:45:57Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed.

Vegoo89 added 6 commits September 19, 2023 18:30

query messages added

8018df9

clear chat added

cc1405e

query messages added

1a668d9

clear chat added

6ca14fc

Merge remote-tracking branch 'origin/queryGenerationFix' into queryGe…

e6a04ff

…nerationFix

Few commented lines removed

c5d98e8

pamelafox reviewed Sep 28, 2023

View reviewed changes

github-actions bot added the Stale label Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`queryMessages` field added & query generation optimization #653

`queryMessages` field added & query generation optimization #653

Vegoo89 commented Sep 20, 2023

Vegoo89 commented Sep 20, 2023

Vegoo89 commented Sep 27, 2023

pamelafox Sep 28, 2023

Vegoo89 Sep 29, 2023

pamelafox commented Sep 28, 2023

Vegoo89 commented Sep 29, 2023 •

edited

pamelafox commented Oct 3, 2023

Vegoo89 commented Oct 4, 2023

github-actions bot commented Dec 4, 2023

queryMessages field added & query generation optimization #653

Are you sure you want to change the base?

queryMessages field added & query generation optimization #653

Conversation

Vegoo89 commented Sep 20, 2023

Purpose

Does this introduce a breaking change?

Pull Request Type

How to Test

What to Check

Other Information

Vegoo89 commented Sep 20, 2023

Vegoo89 commented Sep 27, 2023

pamelafox Sep 28, 2023

Choose a reason for hiding this comment

Vegoo89 Sep 29, 2023

Choose a reason for hiding this comment

pamelafox commented Sep 28, 2023

Vegoo89 commented Sep 29, 2023 • edited

pamelafox commented Oct 3, 2023

Vegoo89 commented Oct 4, 2023

github-actions bot commented Dec 4, 2023

`queryMessages` field added & query generation optimization #653

`queryMessages` field added & query generation optimization #653

Vegoo89 commented Sep 29, 2023 •

edited