Cannot take `gradient` of L2 regularization loss #2441

Vilin97 · 2024-05-10T22:13:37Z

Cannot differentiate L2 regularized loss.

using Flux, Zygote
s = Chain(Dense(2 => 100, softsign), Dense(100 => 2))
sqnorm(x) = sum(abs2, x)
gradient(s_ -> sum(sqnorm, Flux.params(s_)), s) # Can't differentiate foreigncall expression $(Expr(:foreigncall, :(:jl_idset_put_idx), Any, svec(Any, Any, Int64), 0, :(:ccall), %77, %78, %79, %76)).

Package versions:

(@v1.11) pkg> st Flux
Status `~/.julia/environments/v1.11/Project.toml`
  [587475ba] Flux v0.14.15

(@v1.11) pkg> st Zygote
Status `~/.julia/environments/v1.11/Project.toml`
  [e88e6eb3] Zygote v0.6.69
  
julia> versioninfo()
Julia Version 1.11.0-beta1
Commit 08e1fc0abb9 (2024-04-10 08:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 80 × Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, broadwell)
Threads: 10 default, 0 interactive, 5 GC (on 80 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 10

The text was updated successfully, but these errors were encountered:

mcabbott · 2024-05-11T00:51:09Z

This ought to work, and does for me.

However, all things Flux.params are headed for extinction, see e.g. #2413. The current idiom for this is Optimisers.trainables... or in most cases, use WeightDecay instead:

julia> gradient(s_ -> sum(sqnorm, Flux.params(s_)), s)  # as above
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169; … ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357 … 0.33881077 0.35217345; -0.03208663 0.039828066 … -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)

julia> import Optimisers

julia> gradient(s_ -> sum(sqnorm, Optimisers.trainables(s_)), s)  # new way, same numbers
((layers = ((weight = Float32[-0.18066745 -0.4179064; 0.3016829 -0.4228169; … ; -0.36133823 -0.23173195; 0.45555136 -0.12170375], bias = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], σ = nothing), (weight = Float32[-0.031820923 -0.41430357 … 0.33881077 0.35217345; -0.03208663 0.039828066 … -0.3371693 -0.34633902], bias = Float32[0.0, 0.0], σ = nothing)),),)

help?> Optimisers.WeightDecay
  WeightDecay(λ = 5e-4)

  Implements L_2 regularisation, also known as ridge regression, when composed with other rules as
  the first transformation in an OptimiserChain.

  It does this by adding λ .* x to the gradient. This is equivalent to adding λ/2 * sum(abs2, x)
  == λ/2 * norm(x)^2 to the loss.

  See also [SignDecay] for L_1 normalisation.

Ideally Optimisers.trainables would be accessible as Flux.trainables, and be included in this package's documentation.

mcabbott added gradients optimisers-dot-jl labels May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot take `gradient` of L2 regularization loss #2441

Cannot take `gradient` of L2 regularization loss #2441

Vilin97 commented May 10, 2024

mcabbott commented May 11, 2024 •

edited

Cannot take gradient of L2 regularization loss #2441

Cannot take gradient of L2 regularization loss #2441

Comments

Vilin97 commented May 10, 2024

mcabbott commented May 11, 2024 • edited

Cannot take `gradient` of L2 regularization loss #2441

Cannot take `gradient` of L2 regularization loss #2441

mcabbott commented May 11, 2024 •

edited