Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in reproducing VoxtLM v1 #5777

Open
cromz22 opened this issue May 7, 2024 · 14 comments
Open

Bugs in reproducing VoxtLM v1 #5777

cromz22 opened this issue May 7, 2024 · 14 comments
Labels
Bug bug should be fixed

Comments

@cromz22
Copy link
Contributor

cromz22 commented May 7, 2024

Bug description

There seems to be several bugs in reproducing VoxtLM v1 with egs2/voxtlm_v1.
I haven't fully figured out all of them, but I'll start this issue and incrementally update it.
Hopefully I'll send a pull request when all the erros are resolved.

Basic environments

  • OS information: Linux 3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64
  • python version: 3.10.14
  • espnet version: 202402
    • Git hash: 0d0428d
    • Commit date: 2024-04-28 23:01:19 +0900
  • pytorch version: 2.3.0

Task information

  • Task: LM
  • Recipe: voxtlm_v1
  • ESPnet2

To reproduce

Steps to reproduce the behavior:

  1. move to the recipe directory cd egs2/voxtlm_v1/lm1
  2. execute run.sh

I'm using reduced data to speed up the debugging process.

Errors

Stage 1

  • Missing path.sh

    • path.sh is missing from the directory. Need to copy from the template
  • Missing db.sh

    • db.sh is missing from the directory. Need to copy from the template and edit according to the specific data directory.
  • Strange stage number configuration

    if [ ${stage} -le 1 ] && [ ${stop_stage} -ge 1 ]; then

    • This should be stage 3. This causes problem when --stage 2 is specified for this script.
    • The stage numbers are different depending on the script for each dataset (starting from 1 or -1, etc.), which is also confusing.
  • gzip: data/librispeech/textlm/librispeech-lm-norm.txt.gz: No such file or directory at local/data_librispeech.sh line 83

  • logging causing problem while data preprocessing

  • cp: cannot stat 'data/librispeech/asr/audio': No such file or directory

    cp -r ${data_dir_librispeech_asr}/audio ${data_dir}/${train_dev}

@cromz22 cromz22 added the Bug bug should be fixed label May 7, 2024
@cromz22
Copy link
Contributor Author

cromz22 commented May 7, 2024

Stage 3

  • (Not exactly a bug) Need to install s3prl before execution
  • FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
    • Full log:
% bash run.sh --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-04T02:43:09 (lm.sh:412:main) Stage 3a: Perform Kmeans using hubert_base features
2024-05-04T02:43:09 (perform_kmeans.sh:54:main) scripts/feats/perform_kmeans.sh --stage 1 --stop-stage 4 --train_set train --dev_set dev --other_sets test  --datadir dump/audio_raw/asr --featdir dump/extracted/asr --audio_format flac --feature_type hubert_base --layer 6 --feature_conf {type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}} --km_dir exp/kmeans/hubert_base_6_1000clusters --portion 0.1 --nclusters 1000 --storage_save_mode true --use_gpu true --nj 16 --cpu_cmd run.pl --cuda_cmd run.pl --skip_stages 2
2024-05-04T02:43:10 (perform_kmeans.sh:92:main) stage 1: Dump hubert_base feature
utils/subset_data_dir.sh: reducing #utt from 358 to 35
2024-05-04T02:43:10 (perform_kmeans.sh:119:main) Subsampling 35 utterances for feature dumping.
Dump SSL train_subset0.1 features to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train_subset0.1 to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train_subset0.1
2024-05-04T02:45:34 (perform_kmeans.sh:216:main) stage 3: Generate K-means pseudo-labels
2024-05-04T02:45:34 (perform_kmeans.sh:225:main) Extract labels to dump/extracted/asr/hubert_base/layer6/train
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train to dump/extracted/asr/hubert_base/layer6/train
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train
run.pl: 16 / 16 failed, log is in dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.*.log

% cat dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.1.log
# python3 pyscripts/feats/dump_km_label.py --in_filetype sound --online_feature_extract true --feature_conf "{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}" --km_path exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl --out_filetype mat --use_gpu true --utt2num_samples dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1 scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt
# Started at Sat May  4 02:45:35 JST 2024
#
2024-05-04 02:45:39 | INFO | root | Namespace(km_path='exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl', use_gpu=True, online_feature_extract=True, feature_conf='{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}', batch_bins=1, utt2num_samples='dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1', in_filetype='sound', out_filetype='mat', rspecifier='scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp', wspecifier='ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt')
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 184, in <module>
    dump_label(**vars(args))
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 133, in dump_label
    apply_kmeans = ApplyKmeans(km_path, use_gpu=use_gpu)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 90, in __init__
    self.km_model = joblib.load(km_path)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 650, in load
    with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
# Accounting: time=5 threads=1
# Ended (code 1) at Sat May  4 02:45:40 JST 2024, elapsed time 5 seconds

@sw005320
Copy link
Contributor

sw005320 commented May 7, 2024

Many thanks for your report.
@wyh2000, can you help Shuichiro?

@wyh2000
Copy link
Contributor

wyh2000 commented May 7, 2024

Many thanks for your report. @wyh2000, can you help Shuichiro?

Yes! I think these errors are because of missing some files. I'll share the path and files to you @cromz22 .

@wyh2000
Copy link
Contributor

wyh2000 commented May 7, 2024

Stage 3

  • (Not exactly a bug) Need to install s3prl before execution

  • FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'

    • Full log:
% bash run.sh --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 3 --stop_stage 3
2024-05-04T02:43:09 (lm.sh:357:main) Skipped stages:  11 12 13
2024-05-04T02:43:09 (lm.sh:412:main) Stage 3a: Perform Kmeans using hubert_base features
2024-05-04T02:43:09 (perform_kmeans.sh:54:main) scripts/feats/perform_kmeans.sh --stage 1 --stop-stage 4 --train_set train --dev_set dev --other_sets test  --datadir dump/audio_raw/asr --featdir dump/extracted/asr --audio_format flac --feature_type hubert_base --layer 6 --feature_conf {type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}} --km_dir exp/kmeans/hubert_base_6_1000clusters --portion 0.1 --nclusters 1000 --storage_save_mode true --use_gpu true --nj 16 --cpu_cmd run.pl --cuda_cmd run.pl --skip_stages 2
2024-05-04T02:43:10 (perform_kmeans.sh:92:main) stage 1: Dump hubert_base feature
utils/subset_data_dir.sh: reducing #utt from 358 to 35
2024-05-04T02:43:10 (perform_kmeans.sh:119:main) Subsampling 35 utterances for feature dumping.
Dump SSL train_subset0.1 features to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train_subset0.1 to dump/extracted/asr/hubert_base/layer6/train_subset0.1
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train_subset0.1
2024-05-04T02:45:34 (perform_kmeans.sh:216:main) stage 3: Generate K-means pseudo-labels
2024-05-04T02:45:34 (perform_kmeans.sh:225:main) Extract labels to dump/extracted/asr/hubert_base/layer6/train
utils/copy_data_dir.sh: copied data from dump/audio_raw/asr/train to dump/extracted/asr/hubert_base/layer6/train
utils/validate_data_dir.sh: Successfully validated data-directory dump/extracted/asr/hubert_base/layer6/train
run.pl: 16 / 16 failed, log is in dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.*.log

% cat dump/extracted/asr/hubert_base/layer6/train/logdir/inference_pseudo_labels_km1000.1.log
# python3 pyscripts/feats/dump_km_label.py --in_filetype sound --online_feature_extract true --feature_conf "{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}" --km_path exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl --out_filetype mat --use_gpu true --utt2num_samples dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1 scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt
# Started at Sat May  4 02:45:35 JST 2024
#
2024-05-04 02:45:39 | INFO | root | Namespace(km_path='exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl', use_gpu=True, online_feature_extract=True, feature_conf='{type=s3prl,conf={s3prl_conf={upstream=hubert_base},download_dir=ckpt,multilayer_feature=False,layer=6}}', batch_bins=1, utt2num_samples='dump/extracted/asr/hubert_base/layer6/train/logdir/utt2num_samples.1', in_filetype='sound', out_filetype='mat', rspecifier='scp:dump/extracted/asr/hubert_base/layer6/train/logdir/inference_kmeans.1.scp', wspecifier='ark,t:dump/extracted/asr/hubert_base/layer6/train/logdir/pseudo_labels_km1000.1.txt')
Traceback (most recent call last):
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 184, in <module>
    dump_label(**vars(args))
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 133, in dump_label
    apply_kmeans = ApplyKmeans(km_path, use_gpu=use_gpu)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/pyscripts/feats/dump_km_label.py", line 90, in __init__
    self.km_model = joblib.load(km_path)
  File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 650, in load
    with open(filename, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'exp/kmeans/hubert_base_6_1000clusters/km_1000.mdl'
# Accounting: time=5 threads=1
# Ended (code 1) at Sat May  4 02:45:40 JST 2024, elapsed time 5 seconds

For this file, you could access it from here https://huggingface.co/soumi-maiti/voxtlm-k1000/blob/main/km_1000.mdl

@cromz22
Copy link
Contributor Author

cromz22 commented May 7, 2024

Thank you! I understand the problem now.
The stage for learning kmeans should not be skipped, while the default setting is to skip it, and it silently causes error at the next step

@wyh2000
Copy link
Contributor

wyh2000 commented May 7, 2024

Thank you! I understand the problem now. The stage for learning kmeans should not be skipped, while the default setting is to skip it, and it silently causes error at the next step

Yes, that's right. But if you want to use pretrained VoxtLM model, I recommend you to skip kmeans learning stage and use released pretrained model.

@cromz22
Copy link
Contributor Author

cromz22 commented May 7, 2024

Thanks, my objective is not just to use it but to execute all the steps myself to understand and improve upon it. Also, I think it's important to make the recipe fully reproducible.

@sw005320
Copy link
Contributor

sw005320 commented May 7, 2024

Thanks, my objective is not just to use it but to execute all the steps myself to understand and improve upon it. Also, I think it's important to make the recipe fully reproducible.

I second you.
We should make this fully reproducible.

@cromz22
Copy link
Contributor Author

cromz22 commented May 10, 2024

Stage 3 (cont.)

  • Invalid handling of loop iterable

    for dset in "${train_set} ${valid_set}" ${test_sets}; do

    • train and dev files are not copied due to this. This causes the following error at the next stage:
    % bash run.sh --stage 4 --stop_stage 4
    - 2024-05-11T03:46:31 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 4 --stop_stage 4
    2024-05-11T03:46:31 (lm.sh:357:main) Skipped stages:  11 12 13
    2024-05-11T03:46:31 (lm.sh:476:main) Stage 4a: Data filtering: dump/raw/org -> dump/raw
    train
    Opened file: dump/raw/train
    textlm: dump/raw/train/text/textlm/text
    Traceback (most recent call last):
      File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 177, in <module>
        prepare_textlm(
      File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 33, in prepare_textlm
        uttid2text = read_text(root / "text")
      File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_lm_data.py", line 12, in read_text
        with text.open("r") as fp:
      File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/pathlib.py", line 1119, in open
        return self._accessor.open(self, mode, buffering, encoding, errors,
    FileNotFoundError: [Errno 2] No such file or directory: 'dump/raw/train/text/textlm/text'
    
  • (not a bug) Inefficient copying of files

    • This is OK for small amount of data, but if the data is large, this copying is inefficient. Creating symbolic links should be enough.
      cp "${_dir}/text" "${data_feats}/${dset}/text/$(basename ${_dir})/"

@cromz22
Copy link
Contributor Author

cromz22 commented May 10, 2024

Stage 4

  • Wrong variable name in local/prepare_bpe_text.py (num -> num_utterances)

    % bash run.sh --stage 4 --stop_stage 4
    2024-05-11T04:21:47 (lm.sh:209:main) ./lm.sh --stage 1 --stop_stage 9 --num_splits_lm 1 --nj 16 --ngpu 4 --gpu_inference true --inference_nj 8 --lang en --token_type bpe --nbpe 10000 --bpe_nlsyms data/nlsyms.txt --bpe_train_text data/train/bpe_text --lm_config conf/train_transformer_size768_e12.yaml --train_set train --valid_set dev --test_sets test --inference_lm valid.acc.ave.pth --km_dir  --lm_inference_asr_config conf/decode_lm_asr.yaml --lm_inference_tts_config conf/decode_lm_tts.yaml --lm_test_text_asr dump/raw/test/text.asr --lm_test_text_tts dump/raw/test/text.tts --lm_test_text_textlm dump/raw/test/text.textlm --lm_test_text_speechlm dump/raw/test/text.speechlm --stage 4 --stop_stage 4
    2024-05-11T04:21:48 (lm.sh:357:main) Skipped stages:  11 12 13
    2024-05-11T04:21:48 (lm.sh:478:main) Stage 4a: Data filtering: dump/raw/org -> dump/raw
    train
    Opened file: dump/raw/train
    textlm: dump/raw/train/text/textlm/text
    Creating textlm:  dump/raw/train/lm_text
    Creating speechlm:  dump/raw/train/lm_text
    Creating asr:  dump/raw/train/lm_text
    Creating tts:  dump/raw/train/lm_text
    dev
    Opened file: dump/raw/dev
    textlm: dump/raw/dev/text/textlm/text
    Creating textlm:  dump/raw/dev/lm_text
    Creating speechlm:  dump/raw/dev/lm_text
    Creating asr:  dump/raw/dev/lm_text
    Creating tts:  dump/raw/dev/lm_text
    test
    Opened file: dump/raw/test
    textlm: dump/raw/test/text/textlm/text
    Creating textlm:  dump/raw/test/lm_text
    Creating speechlm:  dump/raw/test/lm_text
    Creating asr:  dump/raw/test/lm_text
    Creating tts:  dump/raw/test/lm_text
    Traceback (most recent call last):
      File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/egs2/voxtlm_v1/lm1_partial/local/prepare_bpe_text.py", line 25, in <module>
        if i > args.num:
    AttributeError: 'Namespace' object has no attribute 'num'
    

@cromz22
Copy link
Contributor Author

cromz22 commented May 15, 2024

Stage 4 (cont.)

  • Mixed usage of speechlm_ and unitlm_ as uttid prefix
    • At stage 4, two python scripts are applied to test sets for preprocessing:
      python3 local/prepare_lm_data.py --path ${data_feats}/${dset}

      python3 local/prepare_lm_test.py --test_file "${data_feats}/${_dset}/lm_text" --path "${data_feats}/${_dset}"
    • prepare_lm_data.py adds prefix unitlm_ to uttid and write the utterances to dump/raw/test/lm_text:
      uttid = f"unitlm_{uttid}"
    • On the other hand, prepare_lm_test.py tries to read prefix speechlm_ and tries to write the output to dump/raw/test/text.speechlm:
      prepare_lm(test_file, out_dir / "text.speechlm", "speechlm_")
    • As there are no check as to whether the file is empty, dump/raw/test/text.speechlm
    • At stage 8b, lm_calc_perplexity.py tries to refer to this file (likely, I haven't checked exactly where):
      ${python} -m espnet2.bin.lm_calc_perplexity \
    • This causes the following error:
      Traceback (most recent call last):
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/runpy.py", line 86, in _run_code
          exec(code, run_globals)
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 204, in <module>
          main()
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 200, in main
          calc_perplexity(**kwargs)
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/bin/lm_calc_perplexity.py", line 76, in calc_perplexity
          for keys, batch in loader:
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
          data = self._next_data()
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
          return self._process_data(data)
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
          data.reraise()
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/_utils.py", line 705, in reraise
          raise exception
      RuntimeError: Caught RuntimeError in DataLoader worker process 0.
      Original Traceback (most recent call last):
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
          data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/.conda/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
          data.append(next(self.dataset_iter))
        File "/share02/SLC-G/intern/sshimizu/slm/voxtlm/espnet/espnet2/train/iterable_dataset.py", line 241, in __iter__
          raise RuntimeError("No iteration")
      RuntimeError: No iteration
      
    • I believe this involves three problems.
      1. Either speechlm_ or unitlm_ should be used consistently through the entire recipe.
      2. Tests should be written at each step.
      3. Mixed usage of python and shell scripts are making the problem hard to debug. The two python scripts are simple text conversion task that can be written in a few lines of shell scripts.

@cromz22
Copy link
Contributor Author

cromz22 commented May 16, 2024

Stage 10

  • Missing TTS inference config file
    lm_inference.py: error: No such file: conf/decode_lm_tts.yaml
    

@wyh2000 Do you know where I can find this file?

@cromz22 cromz22 changed the title [WIP] Bugs in reproducing VoxtLM v1 Bugs in reproducing VoxtLM v1 May 17, 2024
@wyh2000
Copy link
Contributor

wyh2000 commented May 19, 2024

Stage 10

  • Missing TTS inference config file
    lm_inference.py: error: No such file: conf/decode_lm_tts.yaml
    

@wyh2000 Do you know where I can find this file?
Sorry for late response. You can find this file and other related configs in https://github.com/espnet/espnet/pull/5694/files#diff-56f69b64742cce2bd1651b6a987285a0e29521692d1c1ec5c20e02274b4aef50

@cromz22
Copy link
Contributor Author

cromz22 commented May 20, 2024

Thanks!
I opened a pull request: #5782

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed
Projects
None yet
Development

No branches or pull requests

3 participants