Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

Closed
bhvieira opened this issue May 26, 2021 · 7 comments
Closed

SamePad() with even kernel dimensions does not work (only in CUDA) #1605

bhvieira opened this issue May 26, 2021 · 7 comments

Comments

@bhvieira
Copy link
Contributor

Tested on Flux v0.12.3

It works with CPU, but with CUDA the following happens:

MWE:

gpu(Conv((4,4), 1=>1, pad = SamePad()))(gpu(randn(Float32, 2,2,1,1)))

ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
[1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/cudnn/error.jl:22
[2] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/error.jl:39 [inlined]
[3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float32, 4}, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, algo::CUDA.CUDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/utils/call.jl:26
[4] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:105 [inlined]
[5] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/utils/call.jl:144 [inlined]
[6] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, bias::Nothing, z::CUDA.CuArray{Float32, 4}; y::CUDA.CuArray{Float32, 4}, activation::CUDA.CUDNN.cudnnActivationMode_t, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
@ CUDA.CUDNN ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:103
[7] #cudnnConvolutionForwardWithDefaults#622
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:96 [inlined]
[8] #cudnnConvolutionForward!#621
@ ~/.julia/packages/CUDA/k52QH/lib/cudnn/convolution.jl:53 [inlined]
[9] conv!(y::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false}; alpha::Int64, beta::Int64, algo::Int64)
@ NNlibCUDA ~/.julia/packages/NNlibCUDA/z3edd/src/cudnn/conv.jl:48
[10] conv!
@ ~/.julia/packages/NNlibCUDA/z3edd/src/cudnn/conv.jl:41 [inlined]
[11] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ NNlib ~/.julia/packages/NNlib/ev8gq/src/conv.jl:91
[12] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (4, 4), 1, 1, (1, 1), (2, 1, 2, 1), (1, 1), false})
@ NNlib ~/.julia/packages/NNlib/ev8gq/src/conv.jl:89
[13] (::Conv{2, 4, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4})
@ Flux ~/.julia/packages/Flux/6o4DQ/src/layers/conv.jl:157
[14] top-level scope
@ REPL[35]:1
[15] top-level scope
@ ~/.julia/packages/CUDA/k52QH/src/initialization.jl:81

This works:

gpu(Conv((5,5), 1=>1, pad = SamePad()))(gpu(randn(Float32, 2,2,1,1)))
2×2×1×1 CUDA.CuArray{Float32, 4}: [:, :, 1, 1] = 0.122714 0.129015 -0.173312 -0.874716
@bhvieira
Copy link
Contributor Author

It appears to be related to specifying all four padding dimensions:

gpu(Conv((4,4), 1=>1, pad = (5,2,3,4)))(gpu(randn(Float32, 2,2,1,1)))

@DrChainsaw
Copy link
Contributor

Isn't this just because CUDA (cuDNN) does not support asymetric padding. Even kernel size + SamePad => Asymetric padding.

#775
JuliaGPU/CUDA.jl#128

What happens is that there is an attempt to change the padding to be symmetric (which imo is the wrong thing to do), but when this happens the caller has already supplied an output which assumes the output size for the asymetric case.

The right way imo, and what I think all other libs are doing is to manually pad the input so and at the same time decrease the padding parameter so that it is symmetric.

I think there has been a little bit of controversy as to where this should be done which seems to be the reason why it has fallen between the chairs. For instance, if one just does it in NNlibCUDA\src\cudnn\conv.jl:41 the caller might be surprised that x (the input features) has changed size. One can of course just copy x and pad it but it is also not the best solution as in a vast majority of cases the caller does not care whether x is changed.

julia> cc = Conv((1,1), 1=>1, pad=(1,0,1,0)) |> gpu
Conv((1, 1), 1=>1)

julia> cc(ones(Float32, 1,1,1,1) |> gpu)
┌ Warning: cuDNN does not support asymmetric padding; defaulting to symmetric choice
└ @ NNlibCUDA E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\cudnn.jl:10
┌ Warning: No valid algorithm found, probably bad params for convolution.
└ @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:222
ERROR: CUDNNError: CUDNN_STATUS_BAD_PARAM (code 3)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUDNN.cudnnStatus_t)
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\error.jl:22
  [2] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\error.jl:39 [inlined] 
  [3] cudnnConvolutionForward(handle::Ptr{Nothing}, alpha::Base.RefValue{Float32}, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, x::CUDA.CuArray{Float32, 4}, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, w::CUDA.CuArray{Float32, 4}, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, algo::CUDA.CUDNN.cudnnConvolutionFwdAlgo_t, workSpace::CUDA.CuArray{UInt8, 1}, workSpaceSizeInBytes::Int64, beta::Base.RefValue{Float32}, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, y::CUDA.CuArray{Float32, 4})
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\utils\call.jl:26
  [4] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:105 [inlined]
  [5] macro expansion
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\utils\call.jl:144 [inlined]
  [6] cudnnConvolutionForwardAD(w::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, bias::Nothing, z::CUDA.CuArray{Float32, 4}; y::CUDA.CuArray{Float32, 4}, activation::CUDA.CUDNN.cudnnActivationMode_t, convDesc::CUDA.CUDNN.cudnnConvolutionDescriptor, wDesc::CUDA.CUDNN.cudnnFilterDescriptor, xDesc::CUDA.CUDNN.cudnnTensorDescriptor, yDesc::CUDA.CUDNN.cudnnTensorDescriptor, zDesc::CUDA.CUDNN.cudnnTensorDescriptor, biasDesc::Nothing, alpha::Base.RefValue{Float32}, beta::Base.RefValue{Float32}, dw::Base.RefValue{Any}, dx::Base.RefValue{Any}, dz::Base.RefValue{Any}, dbias::Base.RefValue{Any}, dready::Base.RefValue{Bool})
    @ CUDA.CUDNN E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:103
  [7] #cudnnConvolutionForwardWithDefaults#626
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:96 [inlined]
  [8] #cudnnConvolutionForward!#625
    @ E:\Programs\julia\.julia\packages\CUDA\3VnCC\lib\cudnn\convolution.jl:53 [inlined]
  [9] conv!(y::CUDA.CuArray{Float32, 4}, x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false}; alpha::Int64, beta::Int64, algo::Int64)
    @ NNlibCUDA E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\conv.jl:41
 [10] conv!
    @ E:\Programs\julia\.julia\dev\NNlibCUDA\src\cudnn\conv.jl:34 [inlined]
 [11] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false}; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib E:\Programs\julia\.julia\packages\NNlib\ev8gq\src\conv.jl:91
 [12] conv(x::CUDA.CuArray{Float32, 4}, w::CUDA.CuArray{Float32, 4}, cdims::DenseConvDims{2, (1, 1), 1, 1, (1, 1), (1, 0, 1, 0), (1, 1), false})
    @ NNlib E:\Programs\julia\.julia\packages\NNlib\ev8gq\src\conv.jl:89
 [13] (::Conv{2, 4, typeof(identity), CUDA.CuArray{Float32, 4}, CUDA.CuArray{Float32, 1}})(x::CUDA.CuArray{Float32, 4})
    @ Flux E:\Programs\julia\.julia\packages\Flux\6o4DQ\src\layers\conv.jl:157
 [14] top-level scope
    @ REPL[13]:1

@bhvieira
Copy link
Contributor Author

That should be it then @DrChainsaw. I remember seeing something like that some time ago (years ago already???).
So, closing, since it's known.
Does this also bite Keras/PyTorch?

@DrChainsaw
Copy link
Contributor

I'm pretty sure both Keras and PyTorch does the manual padding trick and the dilemma between copying or surprising mutation should exist there as well afaik. I did look into the tensorflow c++ code long ago but alas I don't remember what I found. Chances are I would have remembered it if it was something else than a copy though.

@ToucheSir
Copy link
Member

To my knowledge, PyTorch doesn't have a concept of automatically calculated "same" padding and requires specifying the amount for each dimension manually. I had a skim through some of the C++ code mentioned, and AFAICT allocating a enlarged buffer is necessary to get uneven same padding working with cuDNN.

@ToucheSir
Copy link
Member

Pytorch 1.9 now supports same padding and warns when the calculated padding is asymmetric: pytorch/pytorch#45667. Should we do the same?

@bhvieira
Copy link
Contributor Author

It's a bit niche, so perhaps leave it at the backlog in #1431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants