Validation error when reading Eval log #834

us2547 · 2024-11-12T03:01:44Z

When using read_eval_log to read the EvalLog (json) receiving error below. The problematic log is not consistent, it happens only raraly. Also, when using EvalLog object as an output from the run (not reading from file), the iterations over the object work fine.

One observation, it seems that the problem is related to the scorer that outputs "string" and doesn't have "metric".
Seems that the problem is related to the issue #775 .

@scorer(metrics=[])
def problem_type(model: Model):
......

Unfortunately problematic file is very large.
Example error trace. The place "2" is used by scorer that outputs "string".

File "/usr/local/stage3technical/var/virtualenv/tcom-middle-tier-10-26-24/lib/python3.11/site-packages/inspect_ai/log/_file.py", line 201, in _read_header_streaming
    results = EvalResults(**v)
              ^^^^^^^^^^^^^^^^
  File "/usr/local/stage3technical/var/virtualenv/tcom-middle-tier-10-26-24/lib/python3.11/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 8 validation errors for EvalResults
scores.2.metrics.accuracy.value.int
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/int_type
scores.2.metrics.accuracy.value.float
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/float_type
sample_reductions.2.samples.165.value.str
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/string_type
sample_reductions.2.samples.165.value.int
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/int_type
sample_reductions.2.samples.165.value.float
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/float_type
sample_reductions.2.samples.165.value.bool
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/bool_type
sample_reductions.2.samples.165.value.list[union[str,int,float,bool]]
  Input should be a valid list [type=list_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/list_type
sample_reductions.2.samples.165.value.dict[str,nullable[union[str,int,float,bool]]]
  Input should be a valid dictionary [type=dict_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/dict_type

The text was updated successfully, but these errors were encountered:

jjallaire-aisi · 2024-11-12T10:38:24Z

@dragonstyle Could you take a look at this?

us2547 · 2024-11-12T21:12:25Z

It seems that the problem is not linked only to the scorer that outputs string. After commenting out the problem was reproduced (with different scorer failing validation). The run on a same dataset but with the limit number of samples works. I suspect the problem somehow related to the specific sample. Is there a way to debug the EvalLog to identify the root cause? The problem was reproduced with latest inspect version.

us2547 · 2024-11-13T20:48:34Z

I believe the root cause of the problem is when scorer output "value" as null. After changing null values to zero, the log parser works.

"metrics": {
          "accuracy": {
            "name": "accuracy",
            "value": null,
            "options": {}
          }
        }

or from "samples":

{
            "value": null,
            "answer": "Answer .....",
            "explanation": "Explanation .....",
            "metadata": {
              "faithfulness_score": null
            },
            "sample_id": "id-2024-5"
          },

dragonstyle · 2024-11-14T00:05:14Z

Is the issue that the value for some cases is being returned as NaN? I could see that we would serialize that to null and that our type validation wouldn't allow that to pass since null isn't a valid value for a score.

us2547 · 2024-11-14T14:52:26Z

The value returned was numpy.nan which was converted to string value null by inspect-ai. The Score class allows to set value to numpy.nan and there are no warnings or errors when doing so, only when reading the log with inspect utility.

dragonstyle self-assigned this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation error when reading Eval log #834

Validation error when reading Eval log #834

us2547 commented Nov 12, 2024 •

edited

Loading

jjallaire-aisi commented Nov 12, 2024

us2547 commented Nov 12, 2024

us2547 commented Nov 13, 2024 •

edited

Loading

dragonstyle commented Nov 14, 2024

us2547 commented Nov 14, 2024 •

edited

Loading

Validation error when reading Eval log #834

Validation error when reading Eval log #834

Comments

us2547 commented Nov 12, 2024 • edited Loading

jjallaire-aisi commented Nov 12, 2024

us2547 commented Nov 12, 2024

us2547 commented Nov 13, 2024 • edited Loading

dragonstyle commented Nov 14, 2024

us2547 commented Nov 14, 2024 • edited Loading

us2547 commented Nov 12, 2024 •

edited

Loading

us2547 commented Nov 13, 2024 •

edited

Loading

us2547 commented Nov 14, 2024 •

edited

Loading