Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression by PhasedLSTM with a gradient explosion #21

Open
hnchang opened this issue Apr 17, 2020 · 1 comment
Open

Regression by PhasedLSTM with a gradient explosion #21

hnchang opened this issue Apr 17, 2020 · 1 comment

Comments

@hnchang
Copy link

hnchang commented Apr 17, 2020

Hello,

When I used PhasedLSTM (PLSTM) to perform the regression (to find the correlation between an input sequence and an output sequence), I got "nan" in the weight , also the loss in the beginning of the first epoch, even I used gradient clipping.

The generated data for training: (little modified from https://fairyonice.github.io/Extract-weights-from-Keras's-LSTM-and-calcualte-hidden-and-cell-states.html)

training_partial_samples

The optimizer is as follows:
model.compile(loss="mean_squared_error", sample_weight_mode="temporal", optimizer = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))

After checked the weights in PLSTM layer, I found the values of timegate-kernel getting larger and larger, then the weights get to "nan". (The first two rows)

large_timegate_weights

I changed to standard LSTM (other settings and learning rate [still 0.01] the same), the loss converges. Therefore, I traced the source code of PLSTM, considering the initialization of timegate_kernel matters, but stuck for a long time, having little progress.

I am wondering if anyone has the similar issue? Any suggestions to find the reason why the gradient get exploded is appreciated. The relevant code is at the link:

https://github.com/hnchang/Regression-with-PhasedLSTM/blob/master/reg_plstm.py

Much thanks,
James

@ntlex
Copy link

ntlex commented May 13, 2020

Hey James,

I am having a similar issue here. Two things that have worked for me:

  1. Reduce the learning rate (on schedule or manually)
  2. Use clipping gradients to prevent them from exploding https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

I hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants