Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correspondence between paper formulas and code #1

Open
a7217339 opened this issue Nov 13, 2024 · 1 comment
Open

Correspondence between paper formulas and code #1

a7217339 opened this issue Nov 13, 2024 · 1 comment

Comments

@a7217339
Copy link

Hello, sorry to bother you. I am a beginner and I have some shallow questions to ask you. May I ask why, in the compute_logprobe function, multiple tokens are indexed on responsive_log_probes corresponding to the probability distribution of a single token? How do these log_tokengsprob sums correspond to [1y | | π ref (· | x)] in the paper? I am looking forward to receiving your reply. Thank you

@jinnaiyuu
Copy link
Collaborator

Thank you very much for your question.

The KL divergence between the identity function 1_y and the reference policy \pi_ref results in the log probability of the text y. We will make this clearer in future revisions.

Please let us know if you have any other questions.
equation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants