plot_results(): Are there any frameworks that allow summarising and visualising inspect logs? #704

sohaibimran7 · 2024-10-16T09:41:10Z

Many evaluation tools have frameworks to allow summarising and visualising results. An example is zeno for lm-eval-harness. I understand that results-summarisation & visualisation needs can be quite diverse and one tool may not work for anyone. Still, I think if inspect ai logs can be easily summarised and visualised, researchers could iterate faster.
I wrote a very quick and dirty class for visualising a list of EvalLogInfos for my own experiments and was wondering what other people use and whether there is interest in results summarisation visualisation support for inspect.

jjallaire-aisi · 2024-10-16T09:53:26Z

This is definitely something we are interested in supporting more deeply! We are soon going to make it possible to run a set of analysis code on top of an eval-set and then display that in the viewer. At the same time, we will hopefully discover some useful common idioms and tools that we can provide. Would love to hear from people on this thread about what the general shape of requirements are!

sohaibimran7 · 2024-10-16T12:31:27Z

I personally would value the following in a visualisation framework:

Ability to categorise logs by {log_dir, run_id, task, dataset, scorer and model}
More finely categorise based on substrings of {model, task, log_dir}
Ability to rename categories and their elements and sort and filter by categories using custom sort and filter functions.
Ability to map each category to a plotting element {x axis, y axis, x offset, y offset, colour, horizontal and vertical faceting in a multi-plot figure}
Ability to plot any figure I like (bar charts, box plots, violins etc.)
Extensibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plot_results(): Are there any frameworks that allow summarising and visualising inspect logs? #704

plot_results(): Are there any frameworks that allow summarising and visualising inspect logs? #704

sohaibimran7 commented Oct 16, 2024

jjallaire-aisi commented Oct 16, 2024

sohaibimran7 commented Oct 16, 2024 •

edited

Loading

plot_results(): Are there any frameworks that allow summarising and visualising inspect logs? #704

plot_results(): Are there any frameworks that allow summarising and visualising inspect logs? #704

Comments

sohaibimran7 commented Oct 16, 2024

jjallaire-aisi commented Oct 16, 2024

sohaibimran7 commented Oct 16, 2024 • edited Loading

sohaibimran7 commented Oct 16, 2024 •

edited

Loading