Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot take gradient of L2 regularization loss #2441

Open
Vilin97 opened this issue May 10, 2024 · 1 comment
Open

Cannot take gradient of L2 regularization loss #2441

Vilin97 opened this issue May 10, 2024 · 1 comment

Comments

@Vilin97
Copy link

Vilin97 commented May 10, 2024

Cannot differentiate L2 regularized loss.

using Flux, Zygote
s = Chain(Dense(2 => 100, softsign), Dense(100 => 2))
sqnorm(x) = sum(abs2, x)
gradient(s_ -> sum(sqnorm, Flux.params(s_)), s) # Can't differentiate foreigncall expression $(Expr(:foreigncall, :(:jl_idset_put_idx), Any, svec(Any, Any, Int64), 0, :(:ccall), %77, %78, %79, %76)).    

Package versions:

(@v1.11) pkg> st Flux
Status `~/.julia/environments/v1.11/Project.toml`
  [587475ba] Flux v0.14.15

(@v1.11) pkg> st Zygote
Status `~/.julia/environments/v1.11/Project.toml`
  [e88e6eb3] Zygote v0.6.69
  
julia> versioninfo()
Julia Version 1.11.0-beta1
Commit 08e1fc0abb9 (2024-04-10 08:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 80 × Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, broadwell)
Threads: 10 default, 0 interactive, 5 GC (on 80 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 10
@mcabbott
Copy link
Member

mcabbott commented May 11, 2024

This ought to work, and does for me.

However, all things Flux.params are headed for extinction, see e.g. #2413. The current idiom for this is Optimisers.trainables... or in most cases, use WeightDecay instead:

julia> gradient(s_ -> sum(sqnorm, Flux.params(s_)), s)  # as above
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169;  ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357  0.33881077 0.35217345; -0.03208663 0.039828066  -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)

julia> import Optimisers

julia> gradient(s_ -> sum(sqnorm, Optimisers.trainables(s_)), s)  # new way, same numbers
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169;  ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0    0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357  0.33881077 0.35217345; -0.03208663 0.039828066  -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)

help?> Optimisers.WeightDecay
  WeightDecay= 5e-4)

  Implements L_2 regularisation, also known as ridge regression, when composed with other rules as
  the first transformation in an OptimiserChain.

  It does this by adding λ .* x to the gradient. This is equivalent to adding λ/2 * sum(abs2, x)
  == λ/2 * norm(x)^2 to the loss.

  See also [SignDecay] for L_1 normalisation.

Ideally Optimisers.trainables would be accessible as Flux.trainables, and be included in this package's documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants