Matching the response tokens #197

vwxyzjn · 2023-06-16T21:28:35Z

Hello, thanks for the nice reference code! I noticed the following code tries to match the response tokens, but it might match the instruction tokens instead

dolly/training/trainer.py

Lines 60 to 63 in aaa0ecb

 response_token_ids_start_idx = None 

 for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]: 

 response_token_ids_start_idx = idx 

 break

This is because it breaks when the first token matches, but '### Response:\n' is encoded with [21017, 18261, 25, 198]., but it matches ### Instruction:\n ([21017, 46486, 25, 198]) instead.

To resolve the issue and if it is indeed that you intended to match the response tokens, you should consider the following snippet instead :)

            for idx in np.where(batch["labels"][i] == response_token_ids[0])[0]:
                # `response_token_ids` is `'### Response:\n'`, here we are just making sure that the token IDs match
                if response_token_ids == examples[i]["input_ids"][idx:idx+len(response_token_ids)]:
                    response_token_ids_start_idx = idx

Our related issue huggingface/trl#445 (comment)

The text was updated successfully, but these errors were encountered:

srowen · 2023-06-17T13:30:01Z

CC @matthayes , WDYT?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching the response tokens #197

Matching the response tokens #197

vwxyzjn commented Jun 16, 2023

srowen commented Jun 17, 2023

Matching the response tokens #197

Matching the response tokens #197

Comments

vwxyzjn commented Jun 16, 2023

srowen commented Jun 17, 2023