-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62
Comments
Opened this pull to show what I tried #63 |
any luck fixing? having the same issue. WD14 tagger been a pain to run |
I just started looking into this yesterday, will try to fix it when I have more time. Just starting discussions here in case I miss something. |
was working fine and then it started doing this between 2 generation, no idea what i did. Anyone fixed it yet? |
When I have to switch to |
I'm extremely grateful, the solution you provided has helped me a lot. |
May I ask if your problem has been solved? I have encountered this issue as well |
I want to use CUDA instead of CPU to increase the speed on tag inference.
My machine Ubuntu 22.04.3 LTS (GNU/Linux 6.5.0-35-generic x86_64), CUDA 12.2
I learned from https://onnxruntime.ai/docs/install/ that if you have cuda 12 must install using
pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
as of time of writing, instead of simplypip install onnxruntime-gpu
which is for cuda 11. This took me a while to figure out. Kept getting errors that didn't make sense:I had those objects. but after reading carefully and reinstalling based on the above for cuda 12 it worked. Using CUDAExecutionprovider instead of CPUExecutionprovider however did cause a new warning:
[W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.
Basically bottlenecked by CPU/GPU data transfer. Trying to figure out but have not been able to successfully.
The text was updated successfully, but these errors were encountered: