-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'nan' loss function when using layer normalization #13
Comments
CyberZHG's variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True)
std = K.sqrt(variance + self.epsilon) My std = K.std(x, axis=-1, keepdims=True) I think maybe there are input sequences with length 0, and the whole sequence is mask. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi,
I was using only the LayerNormalization from your code in mine. I didn't change anything from the code, apart from overriding the
compute_mask
function, as my input is an Embedding withmask_zero=True
Code
but strangely I get all
nan
for all the measurements I do while training and tuning (loss function and others). I tried using other implementations of the LayerNormalization layer (e.g. https://github.com/CyberZHG/keras-layer-normalization), and everything works without problem. I was wondering whether you have any clue about that.The text was updated successfully, but these errors were encountered: