-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in DataCollatorSpeechSeq2SeqWithPadding (Unit 5) #85
Comments
If the |
If you look at the issue I mentioned, it seems that you stated the opposite. Here's what you wrote for context:
I might be wrong. But |
In the unit 5 of the audio course, the following code is used:
However, according to the following issue,
bos_token_id
shouldn't be used (@ArthurZucker). In my opinion, this should be replaced withself.processor.tokenizer.convert_tokens_to_ids("<|startoftranscript|>")
or withmodel.config.decoder_start_token_id
. What do you think?Note if this is true, then there would be a similar error in @sanchit-gandhi's fine-tuning tutorial too.
Thanks for your attention.
Regards,
Tony
The text was updated successfully, but these errors were encountered: