Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation error when reading Eval log #834

Open
us2547 opened this issue Nov 12, 2024 · 5 comments
Open

Validation error when reading Eval log #834

us2547 opened this issue Nov 12, 2024 · 5 comments
Assignees

Comments

@us2547
Copy link

us2547 commented Nov 12, 2024

When using read_eval_log to read the EvalLog (json) receiving error below. The problematic log is not consistent, it happens only raraly. Also, when using EvalLog object as an output from the run (not reading from file), the iterations over the object work fine.

One observation, it seems that the problem is related to the scorer that outputs "string" and doesn't have "metric".
Seems that the problem is related to the issue #775 .

@scorer(metrics=[])
def problem_type(model: Model):
......

Unfortunately problematic file is very large.
Example error trace. The place "2" is used by scorer that outputs "string".

File "/usr/local/stage3technical/var/virtualenv/tcom-middle-tier-10-26-24/lib/python3.11/site-packages/inspect_ai/log/_file.py", line 201, in _read_header_streaming
    results = EvalResults(**v)
              ^^^^^^^^^^^^^^^^
  File "/usr/local/stage3technical/var/virtualenv/tcom-middle-tier-10-26-24/lib/python3.11/site-packages/pydantic/main.py", line 193, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 8 validation errors for EvalResults
scores.2.metrics.accuracy.value.int
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/int_type
scores.2.metrics.accuracy.value.float
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/float_type
sample_reductions.2.samples.165.value.str
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/string_type
sample_reductions.2.samples.165.value.int
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/int_type
sample_reductions.2.samples.165.value.float
  Input should be a valid number [type=float_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/float_type
sample_reductions.2.samples.165.value.bool
  Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/bool_type
sample_reductions.2.samples.165.value.list[union[str,int,float,bool]]
  Input should be a valid list [type=list_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/list_type
sample_reductions.2.samples.165.value.dict[str,nullable[union[str,int,float,bool]]]
  Input should be a valid dictionary [type=dict_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.8/v/dict_type

@jjallaire-aisi
Copy link
Collaborator

@dragonstyle Could you take a look at this?

@us2547
Copy link
Author

us2547 commented Nov 12, 2024

It seems that the problem is not linked only to the scorer that outputs string. After commenting out the problem was reproduced (with different scorer failing validation). The run on a same dataset but with the limit number of samples works. I suspect the problem somehow related to the specific sample. Is there a way to debug the EvalLog to identify the root cause? The problem was reproduced with latest inspect version.

@us2547
Copy link
Author

us2547 commented Nov 13, 2024

I believe the root cause of the problem is when scorer output "value" as null. After changing null values to zero, the log parser works.

"metrics": {
          "accuracy": {
            "name": "accuracy",
            "value": null,
            "options": {}
          }
        }

or from "samples":

{
            "value": null,
            "answer": "Answer .....",
            "explanation": "Explanation .....",
            "metadata": {
              "faithfulness_score": null
            },
            "sample_id": "id-2024-5"
          },

@dragonstyle
Copy link
Collaborator

Is the issue that the value for some cases is being returned as NaN? I could see that we would serialize that to null and that our type validation wouldn't allow that to pass since null isn't a valid value for a score.

@us2547
Copy link
Author

us2547 commented Nov 14, 2024

The value returned was numpy.nan which was converted to string value null by inspect-ai. The Score class allows to set value to numpy.nan and there are no warnings or errors when doing so, only when reading the log with inspect utility.

@dragonstyle dragonstyle self-assigned this Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants