Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
NathanHB
left a comment
There was a problem hiding this comment.
looks good ! Unglodly amount of regex logic lmao but guess there is no way out ^^'
| try: | ||
| add_to_specifics_with_timeout(formatted_doc, extracted_predictions, extracted_golds) | ||
| except: # noqa: E722 | ||
| except Exception: # noqa: E722 |
There was a problem hiding this comment.
why adding exception without using it ?
There was a problem hiding this comment.
There are some exceptions that don't inherit from Exception (e.g., KeyboardInterrupt). This shouldn't be caught, as we want the program to stop when the user sends it.
| ), | ||
| (r"$(3,1)$", r"${1,3}$", 1), | ||
| # Shouldn't work as it's ordered tuple vs set | ||
| # (r"$(3,1)$", r"${1,3}$", 1), |
There was a problem hiding this comment.
The first version of math-verify didn't make a distinction between tuples and finite sets. The new version makes a distinction between them. Now it's very hard to know what was meant to be a set and what wasn't, so we use gold, which can be control the behavior.
Because the gold here is now an ordered tuple and pred is a set with incorrect ordering the result is false. If the gold was either set {3,1} or the pred was {1,3} (we assume the user meant tuple) it would be true.
| ], | ||
| ) | ||
| def test_math_numina_cases(gold, pred, expected): | ||
| assert compare_strings(gold, pred, match_types=["latex", "expr"]) == expected |
There was a problem hiding this comment.
they all expect 1, would it be interesting to add cases where it should fail?
There was a problem hiding this comment.
I think there are some with 0, to check that I didn't just do return True hhh
There was a problem hiding this comment.
But this was mostly to ensure that the new supported stuff works
* update extraction match to reflect newest math-verify * revert symbols, improve sets handling * rm todo * fmt + remove empty excepts + bump l2s * fmt * docstring
* update extraction match to reflect newest math-verify * revert symbols, improve sets handling * rm todo * fmt + remove empty excepts + bump l2s * fmt * docstring
I have made several changes to math-verify to improve recall on correct answers, this should sync it with lighteval.
There are no more changes planned for it right now, so should be stable from now on