-
Notifications
You must be signed in to change notification settings - Fork 2
Update question router evaluation rake task to return the answer message #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…o also return the answer message The evaluation:generate_question_routing_response task now returns the answer message in its output. This is needed for downstream evaluation of the question router behaviour.
…ned answer Adjusted the specs for evaluation:generate_question_routing_response to verify that the 'answer' field is returned correctly. 'answer' is nil for genuine_rag classification and includes the message for other classifications. This ensures tests match the updated task behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work on the Ruby!
Just a couple of changes for long lines (and I think we can drop a test)
spec/lib/tasks/evaluation_spec.rb
Outdated
end | ||
end | ||
|
||
it "the answer message is null when classification is genuine_rag" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this test as we're not actually testing a conditional in the code.
But a tip for future: For these RSpec tests use the it
method to form a sentence. So for this it would probably be it "has a nil answer for a genuine_rag classification"
spec/lib/tasks/evaluation_spec.rb
Outdated
it "outputs the response as JSON to stdout" do | ||
ClimateControl.modify(INPUT: input) do | ||
answer = build(:answer, question_routing_label: "genuine_rag", question_routing_confidence_score: 0.2) | ||
answer = build(:answer, question_routing_label: "unclear_intent", question_routing_confidence_score: 0.2, message: "Sorry, can you say that again?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
answer = build(:answer, question_routing_label: "unclear_intent", question_routing_confidence_score: 0.2, message: "Sorry, can you say that again?") | |
answer = build(:answer, | |
question_routing_label: "unclear_intent", | |
question_routing_confidence_score: 0.2, | |
message: "Sorry, can you say that again?") |
Just because this line has got a little long
spec/lib/tasks/evaluation_spec.rb
Outdated
expect { Rake::Task[task_name].invoke("openai") } | ||
.to output("{\"classification\":\"genuine_rag\",\"confidence_score\":0.2}\n").to_stdout | ||
.to output("{\"classification\":\"unclear_intent\",\"confidence_score\":0.2,\"answer\":\"Sorry, can you say that again?\"}\n").to_stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expect { Rake::Task[task_name].invoke("openai") } | |
.to output("{\"classification\":\"genuine_rag\",\"confidence_score\":0.2}\n").to_stdout | |
.to output("{\"classification\":\"unclear_intent\",\"confidence_score\":0.2,\"answer\":\"Sorry, can you say that again?\"}\n").to_stdout | |
expected_output = { | |
classification: "unclear_intent", | |
confidence_score: 0.2, | |
answer: "Sorry, can you say that again?", | |
} | |
expect { Rake::Task[task_name].invoke("openai") } | |
.to output("#{expected_output.to_json}\n").to_stdout |
Sorry this suggestion hasn't quite worked out properly because you only changed one line, but I think the line had got a bit long so it's best to pull the data out and convert to json
…onse Remove unnecessary test and shorten long lines following reviewer's comments
Thanks @kevindew. I think I have implemented the changes you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, nice work 😎
This is so that the downstream evaluation pipeline works as expected as for alphagov/govuk-chat-evaluation#73