You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, sorry to bother you. I am a beginner and I have some shallow questions to ask you. May I ask why, in the compute_logprobe function, multiple tokens are indexed on responsive_log_probes corresponding to the probability distribution of a single token? How do these log_tokengsprob sums correspond to [1y | | π ref (· | x)] in the paper? I am looking forward to receiving your reply. Thank you
The text was updated successfully, but these errors were encountered:
The KL divergence between the identity function 1_y and the reference policy \pi_ref results in the log probability of the text y. We will make this clearer in future revisions.
Please let us know if you have any other questions.
Hello, sorry to bother you. I am a beginner and I have some shallow questions to ask you. May I ask why, in the compute_logprobe function, multiple tokens are indexed on responsive_log_probes corresponding to the probability distribution of a single token? How do these log_tokengsprob sums correspond to [1y | | π ref (· | x)] in the paper? I am looking forward to receiving your reply. Thank you
The text was updated successfully, but these errors were encountered: