Question about task difficulty in the dataset #34

chunhuizhang · 2025-02-17T09:45:07Z

Hi, I noticed that the difficulty in the dataset have different range (e.g. MATH in 1~5, and AIME in 0~10). Does the difficulty in MATH, AIMC and AMC share the same standard ?

michaelzhiluo · 2025-02-17T10:27:47Z

Ultimately depends on the LLM as a difficulty judge, see the prompt in system_prompts.py and the difficulty judge in deepscaler/data/preprocess. We average the difficulty score over pass 8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about task difficulty in the dataset #34

Question about task difficulty in the dataset #34

chunhuizhang commented Feb 17, 2025 •

edited

Loading

michaelzhiluo commented Feb 17, 2025

Question about task difficulty in the dataset #34

Question about task difficulty in the dataset #34

Comments

chunhuizhang commented Feb 17, 2025 • edited Loading

michaelzhiluo commented Feb 17, 2025

chunhuizhang commented Feb 17, 2025 •

edited

Loading