Bugs in reproducing VoxtLM v1 #5777

cromz22 · 2024-05-07T21:35:18Z

Bug description

There seems to be several bugs in reproducing VoxtLM v1 with egs2/voxtlm_v1.
I haven't fully figured out all of them, but I'll start this issue and incrementally update it.
Hopefully I'll send a pull request when all the erros are resolved.

Basic environments

OS information: Linux 3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64
python version: 3.10.14
espnet version: 202402
- Git hash: 0d0428d
- Commit date: 2024-04-28 23:01:19 +0900
pytorch version: 2.3.0

Task information

Task: LM
Recipe: voxtlm_v1
ESPnet2

To reproduce

Steps to reproduce the behavior:

move to the recipe directory cd egs2/voxtlm_v1/lm1
execute run.sh

I'm using reduced data to speed up the debugging process.

Errors

Stage 1

Missing path.sh
- path.sh is missing from the directory. Need to copy from the template
Missing db.sh
- db.sh is missing from the directory. Need to copy from the template and edit according to the specific data directory.
Strange stage number configuration

espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh

Line 73 in 5b5ae5a

if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then
- This should be stage 3. This causes problem when --stage 2 is specified for this script.
- The stage numbers are different depending on the script for each dataset (starting from 1 or -1, etc.), which is also confusing.
gzip: data/librispeech/textlm/librispeech-lm-norm.txt.gz: No such file or directory at local/data_librispeech.sh line 83
- does not match the path when the data was downloaded at line 78
- The data.sh should be handling data downloading stage and data preprocessing stage separately, whereas current data.sh isn't doing things that way.
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
  
  Line 78 in 5b5ae5a
  
  wget http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz -P ${data_dir_textlm}/train/
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
  
  Line 83 in 5b5ae5a
  
  zcat ${data_dir}/textlm/librispeech-lm-norm.txt.gz | \
- same for line 76
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librispeech.sh
  
  Line 76 in 5b5ae5a
  
  if [ ! -e ${data_dir_textlm}/librispeech-lm-norm.txt.gz ]; then
logging causing problem while data preprocessing
- In local/data_librilight.sh line 75, the script produces segment_audio.log
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
  
  Line 75 in 5b5ae5a
  
  ${train_cmd} --num_threads ${nj} "${_logdir}/segment_audio.log" \
- line 83 calls local/librilight/data_prep_librilight.sh
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
  
  Line 83 in 5b5ae5a
  
  local/librilight/data_prep_librilight.sh ${output_dir} ${data_dir}/librilight_${part}
- data_prep_librilight.sh executes the following find command with pipe:
  
  espnet/egs2/voxtlm_v1/lm1/local/librilight/data_prep_librilight.sh
  
  Line 30 in 5b5ae5a
  
  for reader_dir in $(find -L $src -mindepth 1 -maxdepth 1 -type d | sort); do
- This causes the following error
```
local/librilight/data_prep_librilight.sh: line 32: [: logdir: integer expression expected
local/librilight/data_prep_librilight.sh: unexpected subdirectory name logdir
```
- Temporal solution is to add grep -v logdir to the pipe
cp: cannot stat 'data/librispeech/asr/audio': No such file or directory

espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh

Line 91 in 5b5ae5a

cp -r ${data_dir_librispeech_asr}/audio ${data_dir}/${train_dev}
- Honestly I did not understand why this line is here. Temporal solution is to comment this out
- Same for line 97
  
  espnet/egs2/voxtlm_v1/lm1/local/data_librilight.sh
  
  Line 97 in 5b5ae5a
  
  cp -r ${data_dir_librispeech_asr}/test/audio ${data_dir}/"test"

The text was updated successfully, but these errors were encountered:

cromz22 · 2024-05-07T21:37:00Z

Stage 3

(Not exactly a bug) Need to install s3prl before execution
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
- Full log:

% bash run.sh --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-04T02:43:09 (lm.sh:412:main) Stage 3a: Perform Kmeans using hubert_base features
2024-05-04T02:43:09 (perform_kmeans.sh:54:main) scripts/feats/perform_kmeans.sh --stage 1 --stop-stage 4 --train_set train --dev_set dev --other_sets test  --datadir dump/audio_raw/asr --featdir dump/extracted/asr --audio_format flac --feature_type hubert_base --layer 6 --feature_conf {type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}} --km_dir exp/kmeans/hubert_base_6_1000clusters --portion 0.1 --nclusters 1000 --storage_save_mode true --use_gpu true --nj 16 --cpu_cmd run.pl --cuda_cmd run.pl --skip_stages 2
2024-05-04T02:43:10 (perform_kmeans.sh:92:main) stage 1: Dump hubert_base feature
utils/subset_data_dir.sh: reducing #utt from 358 to 35
2024-05-04T02:43:10 (perform_kmeans.sh:119:main) Subsampling 35 utterances for feature dumping.
Dump SSL train_subset0.1 features to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train_subset0.1 to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train_subset0.1
2024-05-04T02:45:34 (perform_kmeans.sh:216:main) stage 3: Generate K-means pseudo-labels
2024-05-04T02:45:34 (perform_kmeans.sh:225:main) Extract labels to dump/extracted/asr/hubert_base/layer6/train
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train to dump/extracted/asr/hubert_base/layer6/train
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train
run.pl: 16 / 16 failed, log is in dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.*.log

% cat dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.1.log
# python3 pyscripts/feats/dump_km_label.py --in_filetype sound --online_feature_extract true --feature_conf "{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}" --km_path exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl --out_filetype mat --use_gpu true --utt2num_samples dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1 scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt
# Started at Sat May  4 02:45:35 JST 2024
#
2024-05-04 02:45:39 | INFO | root | Namespace(km_path='exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl', use_gpu=True, online_feature_extract=True, feature_conf='{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}', batch_bins=1, utt2num_samples='dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1', in_filetype='sound', out_filetype='mat', rspecifier='scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp', wspecifier='ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt')
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 184, in <module>
    dump_label(**vars(args))
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 133, in dump_label
    apply_kmeans = ApplyKmeans(km_path, use_gpu=use_gpu)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 90, in __init__
    self.km_model = joblib.load(km_path)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 650, in load
    with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
# Accounting: time=5 threads=1
# Ended (code 1) at Sat May  4 02:45:40 JST 2024, elapsed time 5 seconds

sw005320 · 2024-05-07T21:57:41Z

Many thanks for your report.
@wyh2000, can you help Shuichiro?

wyh2000 · 2024-05-07T22:40:00Z

Many thanks for your report. @wyh2000, can you help Shuichiro?

Yes! I think these errors are because of missing some files. I'll share the path and files to you @cromz22 .

wyh2000 · 2024-05-07T22:40:39Z

Stage 3

(Not exactly a bug) Need to install s3prl before execution
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
- Full log:

% bash run.sh --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-04T02:43:09 (lm.sh:412:main) Stage 3a: Perform Kmeans using hubert_base features
2024-05-04T02:43:09 (perform_kmeans.sh:54:main) scripts/feats/perform_kmeans.sh --stage 1 --stop-stage 4 --train_set train --dev_set dev --other_sets test  --datadir dump/audio_raw/asr --featdir dump/extracted/asr --audio_format flac --feature_type hubert_base --layer 6 --feature_conf {type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}} --km_dir exp/kmeans/hubert_base_6_1000clusters --portion 0.1 --nclusters 1000 --storage_save_mode true --use_gpu true --nj 16 --cpu_cmd run.pl --cuda_cmd run.pl --skip_stages 2
2024-05-04T02:43:10 (perform_kmeans.sh:92:main) stage 1: Dump hubert_base feature
utils/subset_data_dir.sh: reducing #utt from 358 to 35
2024-05-04T02:43:10 (perform_kmeans.sh:119:main) Subsampling 35 utterances for feature dumping.
Dump SSL train_subset0.1 features to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train_subset0.1 to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train_subset0.1
2024-05-04T02:45:34 (perform_kmeans.sh:216:main) stage 3: Generate K-means pseudo-labels
2024-05-04T02:45:34 (perform_kmeans.sh:225:main) Extract labels to dump/extracted/asr/hubert_base/layer6/train
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train to dump/extracted/asr/hubert_base/layer6/train
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train
run.pl: 16 / 16 failed, log is in dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.*.log

% cat dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.1.log
# python3 pyscripts/feats/dump_km_label.py --in_filetype sound --online_feature_extract true --feature_conf "{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}" --km_path exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl --out_filetype mat --use_gpu true --utt2num_samples dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1 scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt
# Started at Sat May  4 02:45:35 JST 2024
#
2024-05-04 02:45:39 | INFO | root | Namespace(km_path='exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl', use_gpu=True, online_feature_extract=True, feature_conf='{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}', batch_bins=1, utt2num_samples='dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1', in_filetype='sound', out_filetype='mat', rspecifier='scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp', wspecifier='ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt')
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 184, in <module>
    dump_label(**vars(args))
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 133, in dump_label
    apply_kmeans = ApplyKmeans(km_path, use_gpu=use_gpu)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 90, in __init__
    self.km_model = joblib.load(km_path)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 650, in load
    with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
# Accounting: time=5 threads=1
# Ended (code 1) at Sat May  4 02:45:40 JST 2024, elapsed time 5 seconds

For this file, you could access it from here https://huggingface.co/soumi-maiti/voxtlm-k1000/blob/main/km_1000.mdl

cromz22 · 2024-05-07T22:59:39Z

Thank you! I understand the problem now.
The stage for learning kmeans should not be skipped, while the default setting is to skip it, and it silently causes error at the next step

wyh2000 · 2024-05-07T23:10:32Z

Thank you! I understand the problem now. The stage for learning kmeans should not be skipped, while the default setting is to skip it, and it silently causes error at the next step

Yes, that's right. But if you want to use pretrained VoxtLM model, I recommend you to skip kmeans learning stage and use released pretrained model.

cromz22 · 2024-05-07T23:14:48Z

Thanks, my objective is not just to use it but to execute all the steps myself to understand and improve upon it. Also, I think it's important to make the recipe fully reproducible.

sw005320 · 2024-05-07T23:16:19Z

Thanks, my objective is not just to use it but to execute all the steps myself to understand and improve upon it. Also, I think it's important to make the recipe fully reproducible.

I second you.
We should make this fully reproducible.

cromz22 · 2024-05-10T19:22:01Z

Stage 3 (cont.)

Invalid handling of loop iterable

espnet/egs2/TEMPLATE/lm1/lm.sh

Line 461 in 5b5ae5a

for dset in "${train_set} ${valid_set}" ${test_sets}; do

train and dev files are not copied due to this. This causes the following error at the next stage:

% bash run.sh --stage 4 --stop_stage 4
- 2024-05-11T03:46:31 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 4 --stop_stage 4
2024-05-11T03:46:31 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-11T03:46:31 (lm.sh:476:main) Stage 4a: Data filtering: dump/raw/org -> dump/raw
train
Opened file: dump/raw/train
textlm: dump/raw/train/text/textlm/text
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 177, in <module>
    prepare_textlm(
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 33, in prepare_textlm
    uttid2text = read_text(root / "text")
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 12, in read_text
    with text.open("r") as fp:
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'dump/raw/train/text/textlm/text'

(not a bug) Inefficient copying of files
- This is OK for small amount of data, but if the data is large, this copying is inefficient. Creating symbolic links should be enough.
  
  espnet/egs2/TEMPLATE/lm1/lm.sh
  
  Line 466 in 5b5ae5a
  
  cp "${_dir}/text" "${data_feats}/${dset}/text/$(basename ${_dir})/"

cromz22 · 2024-05-10T20:51:43Z

Stage 4

Wrong variable name in local/prepare_bpe_text.py (num -> num_utterances)

espnet/egs2/voxtlm_v1/lm1/local/prepare_bpe_text.py

Line 25 in 5b5ae5a

if i > args.num:

% bash run.sh --stage 4 --stop_stage 4
2024-05-11T04:21:47 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 4 --stop_stage 4
2024-05-11T04:21:48 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-11T04:21:48 (lm.sh:478:main) Stage 4a: Data filtering: dump/raw/org -> dump/raw
train
Opened file: dump/raw/train
textlm: dump/raw/train/text/textlm/text
Creating textlm:  dump/raw/train/lm_text
Creating speechlm:  dump/raw/train/lm_text
Creating asr:  dump/raw/train/lm_text
Creating tts:  dump/raw/train/lm_text
dev
Opened file: dump/raw/dev
textlm: dump/raw/dev/text/textlm/text
Creating textlm:  dump/raw/dev/lm_text
Creating speechlm:  dump/raw/dev/lm_text
Creating asr:  dump/raw/dev/lm_text
Creating tts:  dump/raw/dev/lm_text
test
Opened file: dump/raw/test
textlm: dump/raw/test/text/textlm/text
Creating textlm:  dump/raw/test/lm_text
Creating speechlm:  dump/raw/test/lm_text
Creating asr:  dump/raw/test/lm_text
Creating tts:  dump/raw/test/lm_text
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_bpe_text.py", line 25, in <module>
    if i > args.num:
AttributeError: 'Namespace' object has no attribute 'num'

cromz22 · 2024-05-15T21:41:42Z

Stage 4 (cont.)

Mixed usage of speechlm_ and unitlm_ as uttid prefix

At stage 4, two python scripts are applied to test sets for preprocessing:

espnet/egs2/TEMPLATE/lm1/lm.sh

Line 480 in 5b5ae5a

python3 local/prepare_lm_data.py --path ${data_feats}/${dset}

espnet/egs2/TEMPLATE/lm1/lm.sh

Line 486 in 5b5ae5a

python3 local/prepare_lm_test.py --test_file "${data_feats}/${_dset}/lm_text" --path "${data_feats}/${_dset}"
prepare_lm_data.py adds prefix unitlm_ to uttid and write the utterances to dump/raw/test/lm_text:

espnet/egs2/voxtlm_v1/lm1/local/prepare_lm_data.py

Line 63 in 5b5ae5a

uttid = f"unitlm_{uttid}"
On the other hand, prepare_lm_test.py tries to read prefix speechlm_ and tries to write the output to dump/raw/test/text.speechlm:

espnet/egs2/voxtlm_v1/lm1/local/prepare_lm_test.py

Line 62 in 5b5ae5a

prepare_lm(test_file, out_dir / "text.speechlm", "speechlm_")
As there are no check as to whether the file is empty, dump/raw/test/text.speechlm
At stage 8b, lm_calc_perplexity.py tries to refer to this file (likely, I haven't checked exactly where):

espnet/egs2/TEMPLATE/lm1/lm.sh

Line 743 in 5b5ae5a

${python} -m espnet2.bin.lm_calc_perplexity \

This causes the following error:

Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 204, in <module>
    main()
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 200, in main
    calc_perplexity(**kwargs)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 76, in calc_perplexity
    for keys, batch in loader:
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
    return self._process_data(data)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
    data.reraise()
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/train/iterable_dataset.py", line 241, in __iter__
    raise RuntimeError("No iteration")
RuntimeError: No iteration

I believe this involves three problems.
1. Either speechlm_ or unitlm_ should be used consistently through the entire recipe.
2. Tests should be written at each step.
3. Mixed usage of python and shell scripts are making the problem hard to debug. The two python scripts are simple text conversion task that can be written in a few lines of shell scripts.

cromz22 · 2024-05-16T15:30:15Z

Stage 10

Missing TTS inference config file

lm_inference.py: error: No such file: conf/decode_lm_tts.yaml

@wyh2000 Do you know where I can find this file?

wyh2000 · 2024-05-19T19:11:21Z

Stage 10
Missing TTS inference config file
lm_inference.py: error: No such file: conf/decode_lm_tts.yaml
@wyh2000 Do you know where I can find this file?
Sorry for late response. You can find this file and other related configs in https://github.com/espnet/espnet/pull/5694/files#diff-56f69b64742cce2bd1651b6a987285a0e29521692d1c1ec5c20e02274b4aef50

cromz22 · 2024-05-20T17:49:59Z

Thanks!
I opened a pull request: #5782

cromz22 added the Bug bug should be fixed label May 7, 2024

cromz22 changed the title ~~[WIP] Bugs in reproducing VoxtLM v1~~ Bugs in reproducing VoxtLM v1 May 17, 2024

cromz22 mentioned this issue May 20, 2024

Bug fix for VoxtLM v1 recipe #5782

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs in reproducing VoxtLM v1 #5777

Bugs in reproducing VoxtLM v1 #5777

cromz22 commented May 7, 2024 •

edited

cromz22 commented May 7, 2024

sw005320 commented May 7, 2024

wyh2000 commented May 7, 2024

wyh2000 commented May 7, 2024

Stage 3

cromz22 commented May 7, 2024

wyh2000 commented May 7, 2024

cromz22 commented May 7, 2024

sw005320 commented May 7, 2024

cromz22 commented May 10, 2024 •

edited

cromz22 commented May 10, 2024

cromz22 commented May 15, 2024

cromz22 commented May 16, 2024

wyh2000 commented May 19, 2024 •

edited

Stage 10

cromz22 commented May 20, 2024

Bugs in reproducing VoxtLM v1 #5777

Bugs in reproducing VoxtLM v1 #5777

Comments

cromz22 commented May 7, 2024 • edited

Bug description

Basic environments

Task information

To reproduce

Errors

Stage 1

cromz22 commented May 7, 2024

Stage 3

sw005320 commented May 7, 2024

wyh2000 commented May 7, 2024

wyh2000 commented May 7, 2024

Stage 3

cromz22 commented May 7, 2024

wyh2000 commented May 7, 2024

cromz22 commented May 7, 2024

sw005320 commented May 7, 2024

cromz22 commented May 10, 2024 • edited

Stage 3 (cont.)

cromz22 commented May 10, 2024

Stage 4

cromz22 commented May 15, 2024

Stage 4 (cont.)

cromz22 commented May 16, 2024

Stage 10

wyh2000 commented May 19, 2024 • edited

Stage 10

cromz22 commented May 20, 2024

cromz22 commented May 7, 2024 •

edited

cromz22 commented May 10, 2024 •

edited

wyh2000 commented May 19, 2024 •

edited