Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Custom metric type #775

Open
us2547 opened this issue Oct 30, 2024 · 6 comments · Fixed by #785 or #799
Open

[Question] Custom metric type #775

us2547 opened this issue Oct 30, 2024 · 6 comments · Fixed by #785 or #799
Assignees

Comments

@us2547
Copy link

us2547 commented Oct 30, 2024

Hi,
Is there a way to define custom metric that would output dictionary or tuple?
I need to create a metric that will output Scorer values aggregated by the value and display occurrence of each value.
The values in my case are strings.
It seems that the only output type from the "metric" that is supported is "float".
Here is example "metric" which is currently not supported:

@metric
def item_count() -> Metric:
    """
    Currently not working. Inspect expects a float value for the metric.
    """
    def metric(scores: list[Score]) -> tuple:
        count_dict = {}
        for score in scores:
            score_value = str(score)
            if score_value in count_dict:
                count_dict[score_value] += 1
            else:
                count_dict[score_value] = 1
        return tuple(count_dict.items())
    return metric

Thank you.

@jjallaire
Copy link
Collaborator

cc @dragonstyle

@dragonstyle
Copy link
Collaborator

What version of Inspect are you using? Metrics in recent versions of Inspect are allowed to return any of the following types:

str | int | float | bool
list[str | int | float | bool]
dict[str, str | int | float | bool | None]

The elements of the system downstream should automatically handle metrics that return lists or dicts, hopefully behaving as you'd expect them to.

@us2547
Copy link
Author

us2547 commented Oct 31, 2024

I'm using latest inspect, version: inspect_ai==0.3.42. I have custom scorer that returns following class (example):

       # Debug
        scorer_value = "test-value"
        return_score = Score(
                value=scorer_value,
                answer=state.output.completion,
                explanation=scorer_explanation,
                metadata=metadata,
            )
        return return_score

The "value" is properly reflected as "test-value" in the json and in inspect view, however if I use custom metric, by the time Scorer.value reaches it, the value is converted to float and is showing as 0.0. Is there a way to send string values to custom metric for processing?

@dragonstyle
Copy link
Collaborator

I can confirm that I'm seeing the same issue (the score arrives with a float value in the metric). I am investigating now, but I suspect the issue is related to how we've implemented sample reducing (and it not dealing very nicely with strings, by default). I'll respond here once I've buttoned this down - sorry for the issue and thanks for reporting it.

@dragonstyle dragonstyle self-assigned this Oct 31, 2024
dragonstyle added a commit that referenced this issue Oct 31, 2024
If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float.

This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common).

Fixes #775
jjallaire pushed a commit that referenced this issue Oct 31, 2024
If a user produces a score whose value is a string, when that value is ‘reduced’ using the default mean reducer, it is coerced to a float. For strings thing means when the Score arrives at the custom metric, it will carry the reduced value which has been coerced to a float.

This fix is minimal - it implements support for string values in the mean reducer, providing the most common string value (or the first string value if non are most common).

Fixes #775
@dragonstyle dragonstyle reopened this Nov 4, 2024
@dragonstyle
Copy link
Collaborator

I'm reverting the original fix to this as it caused additional regressions elsewhere. I will attempt to fix again today.

@dragonstyle
Copy link
Collaborator

We ended up reverting the second fix to this issue as well. I will take another crack at it soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants