Serious bug in libri_train.py #82

michaelklachko · 2018-07-08T22:53:59Z

LibriSpeech dataset (e.g. train-clean-100) is split into multiple directories during preprocessing. Then during training, the code iterates through these directories:
https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/master/speechvalley/main/libri_train.py#L159

The problem is that for each directory, a new model is created according to the maxTimeSteps parameter for the inputs in the directory. This means that if we have 8 directories for train-clean-100 dataset, we are training 8 separate models, which don't share their weights (in fact, every time a model saves a checkpoint, it overwrites the checkpoint saved by the previous model).

This means that we are effectively training only one model out of 8, and we are training it only on data in the last directory (so we are using 1/8 of the dataset).

michaelklachko · 2018-07-09T17:44:28Z

I can put all training .npy files into one directory, but the real problem is that the model would have to fit the largest sample in the whole dataset: if the largest sample is 4000 time steps, then every sample would need to be padded to this size. This would make training extremely slow.

Look at https://github.com/fordDeepDSP/deepSpeech code for a better solution (bucketing sorted inputs).

RoyJames · 2019-10-01T16:03:28Z

From L171, I think the logic is to restore saved parameters trained on previous folders? So I guess it's not training 8 separate models if the keep option is set to True.

                if keep == True:
                    ckpt = tf.train.get_checkpoint_state(savedir)
                    if ckpt and ckpt.model_checkpoint_path:
                        model.saver.restore(sess, ckpt.model_checkpoint_path)
                        print('Model restored from:' + savedir)

michaelklachko · 2019-10-01T16:31:59Z

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

RoyJames · 2019-10-01T18:58:23Z

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

I kinda agree after trying this repo on LibriSpeech. And thank you for the pointers. I also checked fordDSP and SeanNaren's DeepSpeech2 pytorch implementation, but I still see people having trouble getting reasonable WER/CER there without getting responses. I just want to train on LibriSpeech and might have to try Kaldi now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serious bug in libri_train.py #82

Serious bug in libri_train.py #82

michaelklachko commented Jul 8, 2018

michaelklachko commented Jul 9, 2018 •

edited

RoyJames commented Oct 1, 2019

michaelklachko commented Oct 1, 2019

RoyJames commented Oct 1, 2019

Serious bug in libri_train.py #82

Serious bug in libri_train.py #82

Comments

michaelklachko commented Jul 8, 2018

michaelklachko commented Jul 9, 2018 • edited

RoyJames commented Oct 1, 2019

michaelklachko commented Oct 1, 2019

RoyJames commented Oct 1, 2019

michaelklachko commented Jul 9, 2018 •

edited