Why do the answers with the format issue (red ones) continue to increase after step 88 #2

merak0514 · 2025-02-19T13:14:30Z

Thanks for your great work! However, I have a question: why do the answers with the format issue (red ones) continue to increase after step 88?

lkevinzc · 2025-02-20T11:21:41Z

Hi @merak0514, that's a great question!

I will share my understanding on this and welcome further thoughts.

For this CountDown task, the LLM needs to re-try different proposals to make the equation valid. When we do RL, the model gradually learns to output more re-tries (in order to get the answer correct -> maximize its reward). However, for some (harder) questions, we need many re-tries which may exceed the response length budget. This will lead to more answers with the format issue (red ones) because they cannot finish within budget.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do the answers with the format issue (red ones) continue to increase after step 88 #2

Why do the answers with the format issue (red ones) continue to increase after step 88 #2

merak0514 commented Feb 19, 2025 •

edited

Loading

lkevinzc commented Feb 20, 2025 •

edited

Loading

Why do the answers with the format issue (red ones) continue to increase after step 88 #2

Why do the answers with the format issue (red ones) continue to increase after step 88 #2

Comments

merak0514 commented Feb 19, 2025 • edited Loading

lkevinzc commented Feb 20, 2025 • edited Loading

merak0514 commented Feb 19, 2025 •

edited

Loading

lkevinzc commented Feb 20, 2025 •

edited

Loading