New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot take gradient
of L2 regularization loss
#2441
Labels
Comments
This ought to work, and does for me. However, all things julia> gradient(s_ -> sum(sqnorm, Flux.params(s_)), s) # as above
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169; … ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357 … 0.33881077 0.35217345; -0.03208663 0.039828066 … -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)
julia> import Optimisers
julia> gradient(s_ -> sum(sqnorm, Optimisers.trainables(s_)), s) # new way, same numbers
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169; … ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357 … 0.33881077 0.35217345; -0.03208663 0.039828066 … -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)
help?> Optimisers.WeightDecay
WeightDecay(λ = 5e-4)
Implements L_2 regularisation, also known as ridge regression, when composed with other rules as
the first transformation in an OptimiserChain.
It does this by adding λ .* x to the gradient. This is equivalent to adding λ/2 * sum(abs2, x)
== λ/2 * norm(x)^2 to the loss.
See also [SignDecay] for L_1 normalisation. Ideally |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Cannot differentiate L2 regularized loss.
Package versions:
The text was updated successfully, but these errors were encountered: