Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added cudnn_frontend api in caffe to support CUDA11+cuDNN8 #2184

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cheneeheng
Copy link

I tested this setup with CUDA11.7 + cuDNN8.5 on a GTX1660TI. It runs openpose for human pose extraction normally without the huge GPU memory usage issue. The GPU memory usage is the same as the CUDA10.2+cuDNN7 setup, while the inference speed is about ~1fps faster.

Hope this helps someone who needs to use CUDA11 very badly.

Changelog:

  • added cudnn-frontend submodule.
  • updated cmake with new flag and new 3rdparty repository cudnn_frontend .
  • changed caffe submodule repo target.
    -- added DUSE_CUDNN_FRONTEND option. Uses the frontend api instead of the current algorithm wrapper cudnnGetConvolutionForwardAlgorithm_v7 for cuDNN8.
    -- added cudnn_v8_utils.hpp + cudnn_v8_utils.cpp files for cudnn_frontend api. It currently only supports forwardpass.
    -- fixed warnings.
    -- reduced GPU memory usage by setting CUDNN_STREAMS_PER_GROUP=1
    -- added compute capability check in tensor creation to enable tensor core usage in ampere cards.

- added cudnn-frontend submodule
- updated cmake
- changed caffe submodule repo target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant