You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is also a simple CLI set up for testing. There are two tools:
96
+
- ./runner-cli/runner.ts: takes in a CSV representing a conversation and outputs an HTML file containing the summary. The summary is best viewed as an HTML file so that the included citations can be hovered over to see the original comment and votes.
97
+
- ./runner-cli/rerunner.ts: takes in a CSV representing a conversation and reruns summarization a number of times and outputs each of the summaries in one CSV. This is useful for testing consistency.
98
+
99
+
100
+
## Running the Checks
101
+
In the ./evals directory there are a number of checks that can be run on an unlabeled conversation. There are three categories of checks:
102
+
- Monitoring Checks: summary generation failure rate and time to run
103
+
- Quick Checks: whether the summary has an intro and conclusion, and whether all the topics and subtopics from categorization are present
104
+
- Qualitative Checks: measures how often each group is mentioned
105
+
106
+
All three checks are run using the ./evals/run_checks.ts script.
0 commit comments