Skip to content

feat: add kl fix #524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

feat: add kl fix #524

wants to merge 1 commit into from

Conversation

gshennvm
Copy link
Contributor

What does this PR do ?

do kl as prescribed in https://arxiv.org/pdf/2506.09477. not enabling it by default since it changes user's run

Signed-off-by: Gerald Shen <[email protected]>
@gshennvm gshennvm changed the title add kl fix feat: add kl fix Jun 17, 2025
@gshennvm gshennvm added the CI:L0 Run doctests and unit tests label Jun 17, 2025
@gshennvm gshennvm requested a review from SahilJain314 June 17, 2025 18:22
Copy link
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small doc blurb to the .md with all the loss_function details?

@@ -20,6 +20,8 @@ loss_fn:
use_on_policy_kl_approximation: false
use_importance_sampling_correction: false
token_level_loss: true
# see https://arxiv.org/pdf/2506.09477
use_correct_grad_kl: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it 'tang_grad_kl' intsead of 'correct_grad_kl' to make it clear where it's coming from. Also, the paper has a few implementations. Could you specify which one you're using?

@SahilJain314 SahilJain314 marked this pull request as draft June 17, 2025 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:L0 Run doctests and unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants