You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I used PhasedLSTM (PLSTM) to perform the regression (to find the correlation between an input sequence and an output sequence), I got "nan" in the weight , also the loss in the beginning of the first epoch, even I used gradient clipping.
The optimizer is as follows: model.compile(loss="mean_squared_error", sample_weight_mode="temporal", optimizer = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))
After checked the weights in PLSTM layer, I found the values of timegate-kernel getting larger and larger, then the weights get to "nan". (The first two rows)
I changed to standard LSTM (other settings and learning rate [still 0.01] the same), the loss converges. Therefore, I traced the source code of PLSTM, considering the initialization of timegate_kernel matters, but stuck for a long time, having little progress.
I am wondering if anyone has the similar issue? Any suggestions to find the reason why the gradient get exploded is appreciated. The relevant code is at the link:
Hello,
When I used PhasedLSTM (PLSTM) to perform the regression (to find the correlation between an input sequence and an output sequence), I got "nan" in the weight , also the loss in the beginning of the first epoch, even I used gradient clipping.
The generated data for training: (little modified from https://fairyonice.github.io/Extract-weights-from-Keras's-LSTM-and-calcualte-hidden-and-cell-states.html)
The optimizer is as follows:
model.compile(loss="mean_squared_error", sample_weight_mode="temporal", optimizer = keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))
After checked the weights in PLSTM layer, I found the values of timegate-kernel getting larger and larger, then the weights get to "nan". (The first two rows)
I changed to standard LSTM (other settings and learning rate [still 0.01] the same), the loss converges. Therefore, I traced the source code of PLSTM, considering the initialization of timegate_kernel matters, but stuck for a long time, having little progress.
I am wondering if anyone has the similar issue? Any suggestions to find the reason why the gradient get exploded is appreciated. The relevant code is at the link:
https://github.com/hnchang/Regression-with-PhasedLSTM/blob/master/reg_plstm.py
Much thanks,
James
The text was updated successfully, but these errors were encountered: