Sync Math-verify #535

hynky1999 · 2025-02-05T00:26:02Z

I have made several changes to math-verify to improve recall on correct answers, this should sync it with lighteval.
There are no more changes planned for it right now, so should be stable from now on

HuggingFaceDocBuilderDev · 2025-02-05T00:28:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NathanHB

looks good ! Unglodly amount of regex logic lmao but guess there is no way out ^^'

NathanHB · 2025-02-05T09:42:52Z

src/lighteval/metrics/dynamic_metrics.py

@@ -265,12 +269,19 @@ def sample_level_fn(golds: list[str], predictions: list[str], formatted_doc: Doc
        # We have to use timeout because the sypmy to str conversion can be very slow
        try:
            add_to_specifics_with_timeout(formatted_doc, extracted_predictions, extracted_golds)
-        except:  # noqa: E722
+        except Exception:  # noqa: E722


why adding exception without using it ?

There are some exceptions that don't inherit from Exception (e.g., KeyboardInterrupt). This shouldn't be caught, as we want the program to stop when the user sends it.

NathanHB · 2025-02-05T09:50:47Z

tests/metrics/test_extractive_match.py

@@ -864,7 +858,8 @@ def test_math_extraction_edge_cases(gold, pred, expected):
            r"Since $AP:PB = 1:4,$ we can write \[\frac{\overrightarrow{A} - \overrightarrow{P}}{1} = \frac{\overrightarrow{B} - \overrightarrow{P}}{4}.\]Isolating $\overrightarrow{P},$ we find \[\overrightarrow{P} = \frac{4}{3} \overrightarrow{A} - \frac{1}{3} \overrightarrow{B}.\]Thus, $(t,u) = \boxed{\left( \frac{4}{3}, -\frac{1}{3} \right)}.$",
            1,
        ),
-        (r"$(3,1)$", r"${1,3}$", 1),
+        # Shouldn't work as it's ordered tuple vs set
+        # (r"$(3,1)$", r"${1,3}$", 1),


why is it commented ?

The first version of math-verify didn't make a distinction between tuples and finite sets. The new version makes a distinction between them. Now it's very hard to know what was meant to be a set and what wasn't, so we use gold, which can be control the behavior.

Because the gold here is now an ordered tuple and pred is a set with incorrect ordering the result is false. If the gold was either set {3,1} or the pred was {1,3} (we assume the user meant tuple) it would be true.

tests/metrics/test_extractive_match.py

NathanHB · 2025-02-05T09:52:27Z

tests/metrics/test_extractive_match.py

+    ],
+)
+def test_math_numina_cases(gold, pred, expected):
+    assert compare_strings(gold, pred, match_types=["latex", "expr"]) == expected


they all expect 1, would it be interesting to add cases where it should fail?

I think there are some with 0, to check that I didn't just do return True hhh

But this was mostly to ensure that the new supported stuff works

* update extraction match to reflect newest math-verify * revert symbols, improve sets handling * rm todo * fmt + remove empty excepts + bump l2s * fmt * docstring

hynky1999 added 3 commits February 4, 2025 20:06

update extraction match to reflect newest math-verify

c2cb488

revert symbols, improve sets handling

c536de0

rm todo

a75113d

hynky1999 added 3 commits February 5, 2025 01:44

fmt + remove empty excepts + bump l2s

308ecb9

fmt

8b7711f

docstring

86f4978

hynky1999 requested a review from clefourrier February 5, 2025 09:29

NathanHB approved these changes Feb 5, 2025

View reviewed changes

hynky1999 merged commit cb35bea into main Feb 5, 2025
4 checks passed

hynky1999 added a commit that referenced this pull request May 22, 2025

Sync Math-verify (#535)

654dbe9

* update extraction match to reflect newest math-verify * revert symbols, improve sets handling * rm todo * fmt + remove empty excepts + bump l2s * fmt * docstring

jiangwangyi mentioned this pull request Jun 30, 2025

[FT] Why not use math_verify as a direct dependency? #848

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync Math-verify #535

Sync Math-verify #535

Uh oh!

hynky1999 commented Feb 5, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 5, 2025

Uh oh!

NathanHB left a comment

Uh oh!

NathanHB Feb 5, 2025

Uh oh!

hynky1999 Feb 5, 2025

Uh oh!

NathanHB Feb 5, 2025

Uh oh!

hynky1999 Feb 5, 2025

Uh oh!

Uh oh!

NathanHB Feb 5, 2025

Uh oh!

hynky1999 Feb 5, 2025

Uh oh!

hynky1999 Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

Sync Math-verify #535

Sync Math-verify #535

Uh oh!

Conversation

hynky1999 commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Feb 5, 2025

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

NathanHB Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

hynky1999 Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

NathanHB Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

hynky1999 Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NathanHB Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

hynky1999 Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

hynky1999 Feb 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hynky1999 commented Feb 5, 2025 •

edited

Loading