Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about asr2.sh and its options to reproduce the librispeech_100 recipe. #5720

Open
YoshikiMas opened this issue Mar 26, 2024 · 5 comments
Labels
ASR Automatic speech recogntion Question Question

Comments

@YoshikiMas
Copy link
Contributor

Describe the bug
I have faced some inconveniences when trying to reproduce the asr2 recipe for librispeech_100. I would like to clarify whether my approach is appropriate or not, especially for the second one. After finding the best way, I'll be happy to make a PR.

Stage 5
In these lines, there are ${_suf}${dset}/${train_set} and ${_suf}${dset}/${train_set}_sp. Since dset comes from a previous for loop, I expect it should be removed.

Stage 6 and Stage 7
lm_train_text is data/${train_set}/text.${tgt_case}.${tgt_lang} in the default run.sh, which does not work in Stage 6. This is because ${train_set}/text.${tgt_case}.${tgt_lang} is now included in only dump/extracted if you use speed perturbation. I can set lm_train_text to dump/extracted/wavlm_large/layer21/${train_set}/text.${tgt_case}.${tgt_lang}, but do you have any simpler ideas?
A similar problem happens in the stage 7 with src_bpe_train_text and tgt_bpe_train_text.

@YoshikiMas YoshikiMas added the Question Question label Mar 26, 2024
@sw005320
Copy link
Contributor

Stage 5 In these lines, there are ${_suf}${dset}/${train_set} and ${_suf}${dset}/${train_set}_sp. Since dset comes from a previous for loop, I expect it should be removed.

I think this sounds good to me.

Stage 6 and Stage 7 lm_train_text is data/${train_set}/text.${tgt_case}.${tgt_lang} in the default run.sh, which does not work in Stage 6. This is because ${train_set}/text.${tgt_case}.${tgt_lang} is now included in only dump/extracted if you use speed perturbation. I can set lm_train_text to dump/extracted/wavlm_large/layer21/${train_set}/text.${tgt_case}.${tgt_lang}, but do you have any simpler ideas? A similar problem happens in the stage 7 with src_bpe_train_text and tgt_bpe_train_text.

I think feeding dump/extracted/wavlm_large/layer21/${train_set}/text.${tgt_case}.${tgt_lang} sounds good to me.
One option would be to get the configurations (e.g., wavlm_large and layer21) from the config file, but it is tricky.

@simpleoier, do you have any idea?

@sw005320 sw005320 added the ASR Automatic speech recogntion label Mar 26, 2024
@simpleoier
Copy link
Collaborator

Hi @YoshikiMas, thanks for noticing the problem.
Yeah, I agree with the suggested change in stage 5.
For the lm_train_text, we can simply use data/${train_set}/text, the same as what is used in asr1. We do not need to use the pattern of ${src_case} or ${tgt_case}.
I'll prepare a PR for this after I finish the test.

@sw005320
Copy link
Contributor

For the lm_train_text, we can simply use data/${train_set}/text, the same as what is used in asr1. We do not need to use the pattern of ${src_case} or ${tgt_case}. I'll prepare a PR for this after I finish the test.

I see; this is an LM for the natural text, not a speech token LM.

@simpleoier
Copy link
Collaborator

Yes, exactly!

@YoshikiMas
Copy link
Contributor Author

We do not need to use the pattern of ${src_case} or ${tgt_case}.

Thank you for pointing out! I confirmed that data/${train_set}/text is the same as dump/extracted/.../text.${tgt_case}.${tgt_lang}.

Maybe only the issue is src_bpe_train_text. Of course, we can directly specify it as Shinji suggested for lm_train_text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion Question Question
Projects
None yet
Development

No branches or pull requests

3 participants