Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious bug in libri_train.py #82

Open
michaelklachko opened this issue Jul 8, 2018 · 4 comments
Open

Serious bug in libri_train.py #82

michaelklachko opened this issue Jul 8, 2018 · 4 comments

Comments

@michaelklachko
Copy link

LibriSpeech dataset (e.g. train-clean-100) is split into multiple directories during preprocessing. Then during training, the code iterates through these directories:
https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/master/speechvalley/main/libri_train.py#L159

The problem is that for each directory, a new model is created according to the maxTimeSteps parameter for the inputs in the directory. This means that if we have 8 directories for train-clean-100 dataset, we are training 8 separate models, which don't share their weights (in fact, every time a model saves a checkpoint, it overwrites the checkpoint saved by the previous model).

This means that we are effectively training only one model out of 8, and we are training it only on data in the last directory (so we are using 1/8 of the dataset).

@michaelklachko
Copy link
Author

michaelklachko commented Jul 9, 2018

I can put all training .npy files into one directory, but the real problem is that the model would have to fit the largest sample in the whole dataset: if the largest sample is 4000 time steps, then every sample would need to be padded to this size. This would make training extremely slow.

Look at https://github.com/fordDeepDSP/deepSpeech code for a better solution (bucketing sorted inputs).

@RoyJames
Copy link

RoyJames commented Oct 1, 2019

From L171, I think the logic is to restore saved parameters trained on previous folders? So I guess it's not training 8 separate models if the keep option is set to True.

                if keep == True:
                    ckpt = tf.train.get_checkpoint_state(savedir)
                    if ckpt and ckpt.model_checkpoint_path:
                        model.saver.restore(sess, ckpt.model_checkpoint_path)
                        print('Model restored from:' + savedir)

@michaelklachko
Copy link
Author

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

@RoyJames
Copy link

RoyJames commented Oct 1, 2019

I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.

I kinda agree after trying this repo on LibriSpeech. And thank you for the pointers. I also checked fordDSP and SeanNaren's DeepSpeech2 pytorch implementation, but I still see people having trouble getting reasonable WER/CER there without getting responses. I just want to train on LibriSpeech and might have to try Kaldi now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants