SamePad() with even kernel dimensions does not work (only in CUDA) #1605

bhvieira · 2021-05-26T18:32:14Z

Tested on Flux v0.12.3

It works with CPU, but with CUDA the following happens:

MWE:

gpu(Conv((4,4), 1=>1, pad = SamePad()))(gpu(randn(Float32, 2,2,1,1)))

ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
[1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/cudnn/error.jl:22
[2] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/error.jl:39 [inlined]
[3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float32, 4}, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, algo::CUDA.CUDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/utils/call.jl:26
[4] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:105 [inlined]
[5] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/utils/call.jl:144 [inlined]
[6] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, bias::Nothing, z::CUDA.CuArray{Float32, 4}; y::CUDA.CuArray{Float32, 4}, activation::CUDA.CUDNN.cudnnActivationMode_t, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:103
[7] #cudnnConvolutionForwardWithDefaults#622
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:96 [inlined]
[8] #cudnnConvolutionForward!#621
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:53 [inlined]
[9] conv!(y::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/z3edd/src/cudnn/conv.jl:48
[10] conv!
@ ~/.julia/packages/NNlibCUDA/z3edd/src/cudnn/conv.jl:41 [inlined]
[11] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ NNlib ~/.julia/packages/NNlib/ev8gq/src/conv.jl:91
[12] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false})
@ NNlib ~/.julia/packages/NNlib/ev8gq/src/conv.jl:89
[13] (::Conv{2, 4, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4})
@ Flux ~/.julia/packages/Flux/6o4DQ/src/layers/conv.jl:157
[14] top-level scope
@ REPL[35]:1
[15] top-level scope
@ ~/.julia/packages/CUDA/k52QH/src/initialization.jl:81

This works:

gpu(Conv((5,5), 1=>1, pad = SamePad()))(gpu(randn(Float32, 2,2,1,1)))

2×2×1×1 CUDA.CuArray{Float32, 4}: [:, :, 1, 1] = 0.122714 0.129015 -0.173312 -0.874716

The text was updated successfully, but these errors were encountered:

bhvieira · 2021-05-26T18:40:01Z

It appears to be related to specifying all four padding dimensions:

gpu(Conv((4,4), 1=>1, pad = (5,2,3,4)))(gpu(randn(Float32, 2,2,1,1)))

DrChainsaw · 2021-05-29T13:35:40Z

Isn't this just because CUDA (cuDNN) does not support asymetric padding. Even kernel size + SamePad => Asymetric padding.

#775
JuliaGPU/CUDA.jl#128

What happens is that there is an attempt to change the padding to be symmetric (which imo is the wrong thing to do), but when this happens the caller has already supplied an output which assumes the output size for the asymetric case.

The right way imo, and what I think all other libs are doing is to manually pad the input so and at the same time decrease the padding parameter so that it is symmetric.

I think there has been a little bit of controversy as to where this should be done which seems to be the reason why it has fallen between the chairs. For instance, if one just does it in NNlibCUDA\src\cudnn\conv.jl:41 the caller might be surprised that x (the input features) has changed size. One can of course just copy x and pad it but it is also not the best solution as in a vast majority of cases the caller does not care whether x is changed.

julia> cc = Conv((1,1), 1=>1, pad=(1,0,1,0)) |> gpu
Conv((1, 1), 1=>1)

julia> cc(ones(Float32, 1,1,1,1) |> gpu)
┌ Warning: cuDNN does not support asymmetric padding; defaulting to symmetric choice
└ @ NNlibCUDA E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\cudnn.jl:10
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:222
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\error.jl:22
  [2] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\error.jl:39 [inlined] 
  [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float32, 4}, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, algo::CUDA.CUDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\utils\call.jl:26
  [4] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:105 [inlined]
  [5] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\utils\call.jl:144 [inlined]
  [6] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, bias::Nothing, z::CUDA.CuArray{Float32, 4}; y::CUDA.CuArray{Float32, 4}, activation::CUDA.CUDNN.cudnnActivationMode_t, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:103
  [7] #cudnnConvolutionForwardWithDefaults#626
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:96 [inlined]
  [8] #cudnnConvolutionForward!#625
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:53 [inlined]
  [9] conv!(y::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\conv.jl:41
 [10] conv!
    @ E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\conv.jl:34 [inlined]
 [11] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib E:\Programs\julia\.julia\packages\NNlib\ev8gq\src\conv.jl:91
 [12] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false})
    @ NNlib E:\Programs\julia\.julia\packages\NNlib\ev8gq\src\conv.jl:89
 [13] (::Conv{2, 4, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4})
    @ Flux E:\Programs\julia\.julia\packages\Flux\6o4DQ\src\layers\conv.jl:157
 [14] top-level scope
    @ REPL[13]:1

bhvieira · 2021-05-29T20:28:17Z

That should be it then @DrChainsaw. I remember seeing something like that some time ago (years ago already???).
So, closing, since it's known.
Does this also bite Keras/PyTorch?

DrChainsaw · 2021-05-29T21:20:55Z

I'm pretty sure both Keras and PyTorch does the manual padding trick and the dilemma between copying or surprising mutation should exist there as well afaik. I did look into the tensorflow c++ code long ago but alas I don't remember what I found. Chances are I would have remembered it if it was something else than a copy though.

ToucheSir · 2021-05-31T05:09:49Z

To my knowledge, PyTorch doesn't have a concept of automatically calculated "same" padding and requires specifying the amount for each dimension manually. I had a skim through some of the C++ code mentioned, and AFAICT allocating a enlarged buffer is necessary to get uneven same padding working with cuDNN.

ToucheSir · 2021-06-16T17:24:54Z

Pytorch 1.9 now supports same padding and warns when the calculated padding is asymmetric: pytorch/pytorch#45667. Should we do the same?

bhvieira · 2021-06-16T17:43:32Z

It's a bit niche, so perhaps leave it at the backlog in #1431

bhvieira closed this as completed May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

bhvieira commented May 26, 2021

bhvieira commented May 26, 2021

DrChainsaw commented May 29, 2021

bhvieira commented May 29, 2021

DrChainsaw commented May 29, 2021

ToucheSir commented May 31, 2021

ToucheSir commented Jun 16, 2021

bhvieira commented Jun 16, 2021

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

Comments

bhvieira commented May 26, 2021

bhvieira commented May 26, 2021

DrChainsaw commented May 29, 2021

bhvieira commented May 29, 2021

DrChainsaw commented May 29, 2021

ToucheSir commented May 31, 2021

ToucheSir commented Jun 16, 2021

bhvieira commented Jun 16, 2021