-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs in reproducing VoxtLM v1 #5777
Comments
Stage 3
|
Many thanks for your report. |
For this file, you could access it from here https://huggingface.co/soumi-maiti/voxtlm-k1000/blob/main/km_1000.mdl |
Thank you! I understand the problem now. |
Yes, that's right. But if you want to use pretrained VoxtLM model, I recommend you to skip kmeans learning stage and use released pretrained model. |
Thanks, my objective is not just to use it but to execute all the steps myself to understand and improve upon it. Also, I think it's important to make the recipe fully reproducible. |
I second you. |
Stage 3 (cont.)
|
Stage 4
|
Stage 4 (cont.)
|
Stage 10
@wyh2000 Do you know where I can find this file? |
|
Thanks! |
Bug description
There seems to be several bugs in reproducing VoxtLM v1 with
egs2/voxtlm_v1
.I haven't fully figured out all of them, but I'll start this issue and incrementally update it.
Hopefully I'll send a pull request when all the erros are resolved.
Basic environments
Linux 3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64
Task information
To reproduce
Steps to reproduce the behavior:
cd egs2/voxtlm_v1/lm1
run.sh
I'm using reduced data to speed up the debugging process.
Errors
Stage 1
Missing path.sh
Missing db.sh
Strange stage number configuration
espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
Line 73 in 5b5ae5a
--stage 2
is specified for this script.gzip: data/librispeech/textlm/librispeech-lm-norm.txt.gz: No such file or directory
atlocal/data_librispeech.sh
line 83espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
Line 78 in 5b5ae5a
espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
Line 83 in 5b5ae5a
espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
Line 76 in 5b5ae5a
logging causing problem while data preprocessing
local/data_librilight.sh
line 75, the script produces segment_audio.logespnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
Line 75 in 5b5ae5a
local/librilight/data_prep_librilight.sh
espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
Line 83 in 5b5ae5a
data_prep_librilight.sh
executes the following find command with pipe:espnet/egs2/voxtlm_v1/lm1/local/librilight/data_prep_librilight.sh
Line 30 in 5b5ae5a
grep -v logdir
to the pipecp: cannot stat 'data/librispeech/asr/audio': No such file or directory
espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
Line 91 in 5b5ae5a
espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
Line 97 in 5b5ae5a
The text was updated successfully, but these errors were encountered: