-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about asr2.sh and its options to reproduce the librispeech_100 recipe. #5720
Comments
I think this sounds good to me.
I think feeding @simpleoier, do you have any idea? |
Hi @YoshikiMas, thanks for noticing the problem. |
I see; this is an LM for the natural text, not a speech token LM. |
Yes, exactly! |
Thank you for pointing out! I confirmed that Maybe only the issue is |
Describe the bug
I have faced some inconveniences when trying to reproduce the asr2 recipe for librispeech_100. I would like to clarify whether my approach is appropriate or not, especially for the second one. After finding the best way, I'll be happy to make a PR.
Stage 5
In these lines, there are
${_suf}${dset}/${train_set}
and${_suf}${dset}/${train_set}_sp
. Sincedset
comes from a previous for loop, I expect it should be removed.Stage 6 and Stage 7
lm_train_text
isdata/${train_set}/text.${tgt_case}.${tgt_lang}
in the default run.sh, which does not work in Stage 6. This is because${train_set}/text.${tgt_case}.${tgt_lang}
is now included in onlydump/extracted
if you use speed perturbation. I can setlm_train_text
todump/extracted/wavlm_large/layer21/${train_set}/text.${tgt_case}.${tgt_lang}
, but do you have any simpler ideas?A similar problem happens in the stage 7 with
src_bpe_train_text
andtgt_bpe_train_text
.The text was updated successfully, but these errors were encountered: