LLM-as-a-judge: Correctness (DB)

[metric](https://docs.google.com/document/d/1_EH6laBvpGTbLZ1nFYEX_onWEOB9lM6kTp1jnCA1c5M/edit?tab=t.yvwr52g8om7m#heading=h.lxlxradknwfe)

We want to be able to assess if the generated answer is correct given the information in provided in our [QA calibration tab.](https://docs.google.com/spreadsheets/d/1iM0lHyey42QmbsLyNn0DsLL08d7fW1UXTjfTdUZnT0Q/edit?gid=1183789496#gid=1183789496)

Correctness should just look at facts as defined in the "expected answer" column relative to the generated answer. The judge should just look at the expected answer and see if the facts contained within that are also present in the generated answer. 

**THIS IS FOR THE DB BOT ONLY, the NAV BOT WILL HAVE A SEPERATE TICKET**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLM-as-a-judge: Correctness (DB) #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

LLM-as-a-judge: Correctness (DB) #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions