You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many evaluation tools have frameworks to allow summarising and visualising results. An example is zeno for lm-eval-harness. I understand that results-summarisation & visualisation needs can be quite diverse and one tool may not work for anyone. Still, I think if inspect ai logs can be easily summarised and visualised, researchers could iterate faster.
I wrote a very quick and dirty class for visualising a list of EvalLogInfos for my own experiments and was wondering what other people use and whether there is interest in results summarisation visualisation support for inspect.
The text was updated successfully, but these errors were encountered:
This is definitely something we are interested in supporting more deeply! We are soon going to make it possible to run a set of analysis code on top of an eval-set and then display that in the viewer. At the same time, we will hopefully discover some useful common idioms and tools that we can provide. Would love to hear from people on this thread about what the general shape of requirements are!
I personally would value the following in a visualisation framework:
Ability to categorise logs by {log_dir, run_id, task, dataset, scorer and model}
More finely categorise based on substrings of {model, task, log_dir}
Ability to rename categories and their elements and sort and filter by categories using custom sort and filter functions.
Ability to map each category to a plotting element {x axis, y axis, x offset, y offset, colour, horizontal and vertical faceting in a multi-plot figure}
Ability to plot any figure I like (bar charts, box plots, violins etc.)
Many evaluation tools have frameworks to allow summarising and visualising results. An example is zeno for lm-eval-harness. I understand that results-summarisation & visualisation needs can be quite diverse and one tool may not work for anyone. Still, I think if inspect ai logs can be easily summarised and visualised, researchers could iterate faster.
I wrote a very quick and dirty class for visualising a list of EvalLogInfos for my own experiments and was wondering what other people use and whether there is interest in results summarisation visualisation support for inspect.
The text was updated successfully, but these errors were encountered: