Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching the response tokens #197

Open
vwxyzjn opened this issue Jun 16, 2023 · 1 comment
Open

Matching the response tokens #197

vwxyzjn opened this issue Jun 16, 2023 · 1 comment

Comments

@vwxyzjn
Copy link

vwxyzjn commented Jun 16, 2023

Hello, thanks for the nice reference code! I noticed the following code tries to match the response tokens, but it might match the instruction tokens instead

response_token_ids_start_idx = None
for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]:
response_token_ids_start_idx = idx
break

This is because it breaks when the first token matches, but '### Response:\n' is encoded with [21017, 18261, 25, 198]., but it matches ### Instruction:\n ([21017, 46486, 25, 198]) instead.

To resolve the issue and if it is indeed that you intended to match the response tokens, you should consider the following snippet instead :)

            for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]:
                # `response_token_ids` is `'### Response:\n'`, here we are just making sure that the token IDs match
                if response_token_ids == examples[i]["input_ids"][idx:idx+len(response_token_ids)]:
                    response_token_ids_start_idx = idx  

Our related issue huggingface/trl#445 (comment)

@srowen
Copy link
Collaborator

srowen commented Jun 17, 2023

CC @matthayes , WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants