Dropped gradient only in reverse mode with cached values #1492

ChrisRackauckas · 2024-06-04T08:31:57Z

using Enzyme

not_decapode_f = begin
    function simulate()
        begin
            var"__•1" = Vector{Float64}(undef, 1)
            __V̇ = Vector{Float64}(undef, 1)
        end
        f(du, u, p, t) = begin
                begin
                    X = u[1]
                    k = p[1]
                    V = u[2]
                end
                var"•1" = var"__•1"
                V̇ = __V̇
                var"•1" .= (.-)(k)
                V̇ .= var"•1" .* X
                du[1] = V
                du[2] = V̇[1]
                nothing
            end
    end
end

du = [0.0,0.0]
u = [1.0,3.0]
p = [1.0]
f = not_decapode_f()
f(du,u,p,1.0)
df = Enzyme.make_zero(f)
d_du = Enzyme.make_zero(du)
d_u = Enzyme.make_zero(u)
dp = Enzyme.make_zero(p)

d_du .= 0; d_u .= 0; dp[1] = 1.0
Enzyme.autodiff(Enzyme.Forward, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                Enzyme.Const(1.0))

d_du # [0.0,-1.0]
dp # [1.0]
du # [3.0,-1.0]

d_du .= 0; d_u .= 0; dp[1] = 1.0
Enzyme.autodiff(Enzyme.Reverse, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                Enzyme.Const(1.0))

d_du # [0.0,0.0]
dp # [3.0]
du # [3.0,-1.0]

# Confirm with finite difference

ppet = [1 + 1e-7]
du2 = copy(du)
f(du,u,p,1.0)
f(du2,u,ppet,1.0)
(du2 - du) ./ 1e-7 # [0.0,-1.0000000005838672]

MWE of DARPA-ASKEM/sciml-service#177

vchuravy · 2024-06-04T11:03:52Z

So dp being wrong is due to df holding on to values.

julia> Enzyme.autodiff(Enzyme.Forward, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                       Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                       Enzyme.Const(1.0))
()

julia> df.__V̇
1-element Vector{Float64}:
 -1.0

julia> df.var"__•1"
1-element Vector{Float64}:
 -1.0

If you df = Enzyme.make_zero(f) or manually zero-out the temporaries there.

julia> d_du .= 0; d_u .= 0; dp[1] = 1.0;

julia> df = Enzyme.make_zero(f)
f (generic function with 1 method)

julia> Enzyme.autodiff(Enzyme.Reverse, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                       Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                       Enzyme.Const(1.0))
((nothing, nothing, nothing, nothing),)

julia> d_du # [0.0,0.0]
2-element Vector{Float64}:
 0.0
 0.0

julia> dp # [3.0]
1-element Vector{Float64}:
 1.0

julia> du # [3.0,-1.0]
2-element Vector{Float64}:
  3.0
 -1.0

Then dp is correct, but d_du still isn't.

ChrisRackauckas · 2024-06-04T12:58:30Z

d_du is the one I have to actually use, so it's a bit worrisome 😅

vchuravy · 2024-06-04T14:08:22Z

Yeah, just letting you know that in any case you will have to zero the temporaries in df.

ChrisRackauckas · 2024-06-04T17:08:43Z

Does Enzyme have a utility to fill zero?

wsmoses · 2024-06-06T20:51:17Z

I'll see if I can look at this tomorrow when flying back to the US.

wsmoses · 2024-06-06T23:27:55Z

@ChrisRackauckas this is a calling convention issue from your end.

In reverse mode you need to set the shadow of the return to 1, not the shadow of the input.

using Enzyme

not_decapode_f = begin
    function simulate()
        begin
            var"__•1" = Vector{Float64}(undef, 1)
            __V̇ = Vector{Float64}(undef, 1)
        end
        f(du, u, p, t) = begin
                begin
                    X = u[1]
                    k = p[1]
                    V = u[2]
                end
                var"•1" = var"__•1"
                V̇ = __V̇
                var"•1" .= (.-)(k)
                V̇ .= var"•1" .* X
                du[1] = V
                du[2] = V̇[1]
                nothing
            end
    end
end

du = [0.0,0.0]
u = [1.0,3.0]
p = [1.0]
f = not_decapode_f()
f(du,u,p,1.0)
df = Enzyme.make_zero(f)
d_du = Enzyme.make_zero(du)
d_u = Enzyme.make_zero(u)
dp = Enzyme.make_zero(p)

df = Enzyme.make_zero(f)
d_du .= 0; d_u .= 0; dp[1] = 1.0
Enzyme.autodiff(Enzyme.Reverse, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                Enzyme.Const(1.0))

@show d_du # [0.0,0.0]
@show dp # [3.0]
@show du # [3.0,-1.0]


df = Enzyme.make_zero(f)
# compute the gradient wrt d_u[1]
d_du = [0.0, 1.0]; d_u .= 0; dp[1] = 0.0
Enzyme.autodiff(Enzyme.Reverse, Duplicated(f, df), Enzyme.Duplicated(du, d_du),
                Enzyme.Duplicated(u,d_u), Enzyme.Duplicated(p,dp),
                Enzyme.Const(1.0))

# derivative of d_u[1] / dp, which is what finite differences computes below [in the second term].
@show dp # dp = [-1.0]

# Confirm with finite difference

ppet = [1 + 1e-7]
du2 = copy(du)
f(du,u,p,1.0)
f(du2,u,ppet,1.0)
@show (du2 - du) ./ 1e-7 # [0.0,-1.0000000005838672]

ChrisRackauckas · 2024-06-06T23:30:56Z

oh duh 🤦

wsmoses closed this as completed Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropped gradient only in reverse mode with cached values #1492

Dropped gradient only in reverse mode with cached values #1492

ChrisRackauckas commented Jun 4, 2024

vchuravy commented Jun 4, 2024

ChrisRackauckas commented Jun 4, 2024

vchuravy commented Jun 4, 2024

ChrisRackauckas commented Jun 4, 2024

wsmoses commented Jun 6, 2024

wsmoses commented Jun 6, 2024

ChrisRackauckas commented Jun 6, 2024

Dropped gradient only in reverse mode with cached values #1492

Dropped gradient only in reverse mode with cached values #1492

Comments

ChrisRackauckas commented Jun 4, 2024

vchuravy commented Jun 4, 2024

ChrisRackauckas commented Jun 4, 2024

vchuravy commented Jun 4, 2024

ChrisRackauckas commented Jun 4, 2024

wsmoses commented Jun 6, 2024

wsmoses commented Jun 6, 2024

ChrisRackauckas commented Jun 6, 2024