Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do the answers with the format issue (red ones) continue to increase after step 88 #2

Open
merak0514 opened this issue Feb 19, 2025 · 1 comment

Comments

@merak0514
Copy link

merak0514 commented Feb 19, 2025

Thanks for your great work! However, I have a question: why do the answers with the format issue (red ones) continue to increase after step 88?

Image

@lkevinzc
Copy link
Collaborator

lkevinzc commented Feb 20, 2025

Hi @merak0514, that's a great question!

I will share my understanding on this and welcome further thoughts.

For this CountDown task, the LLM needs to re-try different proposals to make the equation valid. When we do RL, the model gradually learns to output more re-tries (in order to get the answer correct -> maximize its reward). However, for some (harder) questions, we need many re-tries which may exceed the response length budget. This will lead to more answers with the format issue (red ones) because they cannot finish within budget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants