Using AbstractFloat instead of Float64 in warning check for slow conv #475

gabrielpreviato · 2023-02-18T15:51:47Z

PR Checklist

Tests are added
Documentation, if applicable

#383 silence some warnings that are not necessarily useful (as discussed in the PR), but keep the warning for Float64 with other types.

But with the support for half-precision floats, some other combinations can happen and go unnoticed, such as a Float16 weight on a Float32 matrix, or a Float32 weight on an Int matrix ("oh no I forgot to convert my Integers to Floats").

This PR changes the type check from Float64 to AbstractFloat, creating a warning for some unintentional and weird combinations, but still preventing issuing warnings for Dual.ForwardDiff, which I think was the main purpose of the previous PR.

Some examples

Current NNlib (only shows a warning when there is a Float64):

julia> x = rand(Float16, 5, 5, 1, 1)
5×5×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 0.3955  0.76    0.8823  0.4844  0.593
 0.2158  0.51    0.9277  0.2725  0.2163
 0.547   0.8364  0.958   0.939   0.3027
 0.1377  0.7285  0.4229  0.943   0.579
 0.7437  0.5874  0.805   0.146   0.269

julia> w = rand(Int8, 3, 3, 1, 1)
3×3×1×1 Array{Int8, 4}:
[:, :, 1, 1] =
 -68  120  -77
 -75   90   11
   3   92  -25

julia> conv(x, w)
3×3×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
  34.44  119.1    61.03
 101.75   29.06  116.0
  60.0    87.0    46.62

julia> w = rand(Float32, 3, 3, 1, 1)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.476644  0.732119  0.917347
 0.350704  0.753561  0.0978633
 0.633826  0.753227  0.0496203

julia> conv(x, w)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 3.45243  3.77007  2.86686
 2.86372  3.45715  2.65007
 3.47114  3.27677  2.87541

julia> w = rand(Float64, 3, 3, 1, 1)
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.220391  0.259412   0.027865
 0.855732  0.353315   0.893624
 0.531092  0.0226233  0.315492

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float16
│   T2 = Float64
└ @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:192
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 2.22078  2.01215  2.05155
 2.46237  2.55374  2.24465
 1.79309  2.64892  1.81043

This PR (shows a warning for any mixture of different Floats):

PS: I did these tests resetting the runtime since there is a maxlog of 1 for the warnings.

julia> x = rand(Float16, 5, 5, 1, 1)
5×5×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 0.474   0.5884  0.7637   0.6035  0.6396
 0.0762  0.645   0.0952   0.7197  0.818
 0.3394  0.1221  0.543    0.7017  0.767
 0.7173  0.961   0.08936  0.2783  0.3022
 0.678   0.5005  0.7104   0.965   0.4219

julia> w = rand(Int8, 3, 3, 1, 1)
3×3×1×1 Array{Int8, 4}:
[:, :, 1, 1] =
  81  -48  -76
 -15  -41   58
 -23   72  -56

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float16
│   T1 = Float16
│   T2 = Int8
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 -12.91    52.38  -63.06
 -46.8   -126.2    23.25
 -39.9     70.0   -74.44

julia> w = rand(Float32, 3, 3, 1, 1)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.885747  0.745852  0.161355
 0.837498  0.757606  0.457403
 0.86357   0.687177  0.457069

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float32
│   T1 = Float16
│   T2 = Float32
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 2.20351  3.15128  3.26581
 2.8175   3.43277  3.7439
 2.4886   3.27672  3.43359

julia> w = rand(Float64, 3, 3, 1, 1)
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.191498  0.043293  0.34513
 0.417291  0.227745  0.510484
 0.627288  0.680126  0.0511033

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float16
│   T2 = Float64
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 1.83448  1.69441  1.53605
 1.48041  2.10666  2.06308
 1.76039  1.72283  2.31743

julia> f = x -> sum(conv(x, w))
#5 (generic function with 1 method)

julia> ForwardDiff.gradient(f, x)
5×5×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.998969  1.54049  2.26595  1.26698  0.725466
 1.57489   2.57254  4.01191  2.43702  1.43937
 2.10452   3.83771  6.0199   3.91539  2.18219
 1.10555   2.29723  3.75395  2.64841  1.45672
 0.529626  1.26517  2.008    1.47837  0.742825

ToucheSir · 2023-02-18T17:45:06Z

I like the idea, but do we support half precision in the fast path? I didn't think we did since it relies on BLAS and that's usually 32/64bit only.

gabrielpreviato · 2023-02-18T21:59:04Z

Hum, I think I've read somewhere something like "half precision support", but indeed, BLAS only supports 32/64.

But still, if you mix Float32 and Float16 (it was what happened to me and made me create this PR) there will be no warning. Maybe for now, instead of Float64 only we check for Float64 and Float32 then?

ToucheSir · 2023-02-18T22:06:00Z

There should be no warning, because we don't have another implementation of convolutions for those types! Or are you saying that we should be dissuading people from mixing FP16+FP32 altogether?

gabrielpreviato · 2023-02-18T23:30:56Z

I don't think we should be dissuading people from mixing types or doing weird combinations, but I think the warning is something valid and useful once you can actually do this mix by mistake. Flux is currently also issuing a warning for a possible mix of Float16 and Float32.

https://github.com/FluxML/Flux.jl/blob/cebc0d931a3678afdcd04040858f5541bf5ff23b/src/layers/stateless.jl#L56-L60

And if you have scalar indexing disabled, you get a scalar index error because of how the direct convolution is implemented. If you are somehow new to Julia's environment, it can be not so straightforward to detect that the problem is type mixing, IMO.

julia> x = CUDA.rand(Float32, 5, 5, 1, 1)
5×5×1×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.601529  0.142086  0.612069  0.094501  0.699099
 0.821472  0.202396  0.484461  0.337047  0.106021
 0.586893  0.62307   0.168788  0.329471  0.8758
 0.213742  0.770866  0.132256  0.489098  0.570035
 0.392795  0.27233   0.656506  0.795186  0.618649

julia> w = CUDA.rand(Float16, 3, 3, 1, 1)
3×3×1×1 CuArray{Float16, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.4766  0.009766  0.8623
 0.6084  0.0       0.001953
 0.8877  0.2754    0.6045

julia> conv(x, w)
ERROR: TaskFailedException

    nested task error: Scalar indexing is disallowed.
    Invocation of getindex resulted in scalar indexing of a GPU array.
    This is typically caused by calling an iterating implementation of a method.
    Such implementations *do not* execute on the GPU, but very slowly on the CPU,
    and therefore are only permitted from the REPL for prototyping purposes.
    If you did intend to index this array, annotate the caller with @allowscalar.
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:35
     [2] assertscalar(op::String)
       @ GPUArraysCore ~/.julia/packages/GPUArraysCore/B3xv7/src/GPUArraysCore.jl:100
     [3] getindex(::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64, ::Int64, ::Int64, ::Vararg{Int64})
       @ GPUArrays ~/.julia/packages/GPUArrays/5wTN2/src/host/indexing.jl:9
     [4] getindex
       @ ./subarray.jl:282 [inlined]
     [5] conv_direct!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}, ::Val{(3, 3, 1)}, ::Val{1}, ::Val{(0, 0, 0, 0, 0, 0)}, ::Val{(1, 1, 1)}, ::Val{(1, 1, 1)}, fk::Val{false}; alpha::Float32, beta::Bool)
       @ NNlib ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:104
     [6] conv_direct!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Float32, beta::Bool)
       @ NNlib ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:50
     [7] conv_direct!
       @ ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:47 [inlined]
     [8] (::NNlib.var"#308#312"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, DenseConvDims{3, 3, 3, 6, 3}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}})()
       @ NNlib ./threadingconstructs.jl:258
Stacktrace:
  [1] sync_end(c::Channel{Any})
    @ Base ./task.jl:436
  [2] macro expansion
    @ ./task.jl:455 [inlined]
  [3] conv!(out::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in1::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in2::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:205
  [4] conv!
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:185 [inlined]
  [5] #conv!#258
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:145 [inlined]
  [6] conv!
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:140 [inlined]
  [7] conv(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float16, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:88
  [8] conv
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
  [9] #conv#231
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [10] conv(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float16, 4, CUDA.Mem.DeviceBuffer})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:50
 [11] top-level scope
    @ REPL[3]:1

ToucheSir · 2023-02-19T01:52:06Z

The reason <64 and 64 bit mixing is so insidious is because so it's so easy to write Julia code which promotes to the latter. That's less of a concern with 16/32 because one has to explicitly request arrays be that eltype. It's also not as if users have a different option for 16-bit conv operations, which is the case for both 32 and 64 bit. IMO the scalar indexing problem is a separate one and should be addressed by us adding a check for GPU arrays.

Edit: I might support having the fallback warning as a way to tell users that FP16 does not have have a fast path on CPU, but again that's a different discussion than trying to extend the behaviour of the current warning (which as far as I know was meant as a "hey, this isn't hitting the BLAS path" warning) to different eltype combinations.

Using AbstractFloat instead of Float64

a85bbce

ToucheSir mentioned this pull request Feb 26, 2023

warn on batched_mul_generic!? #478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using AbstractFloat instead of Float64 in warning check for slow conv #475

Using AbstractFloat instead of Float64 in warning check for slow conv #475

gabrielpreviato commented Feb 18, 2023 •

edited

ToucheSir commented Feb 18, 2023

gabrielpreviato commented Feb 18, 2023

ToucheSir commented Feb 18, 2023

gabrielpreviato commented Feb 18, 2023

ToucheSir commented Feb 19, 2023 •

edited

Using AbstractFloat instead of Float64 in warning check for slow conv #475

Are you sure you want to change the base?

Using AbstractFloat instead of Float64 in warning check for slow conv #475

Conversation

gabrielpreviato commented Feb 18, 2023 • edited

PR Checklist

Some examples

ToucheSir commented Feb 18, 2023

gabrielpreviato commented Feb 18, 2023

ToucheSir commented Feb 18, 2023

gabrielpreviato commented Feb 18, 2023

ToucheSir commented Feb 19, 2023 • edited

gabrielpreviato commented Feb 18, 2023 •

edited

ToucheSir commented Feb 19, 2023 •

edited