Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

andrewtvuong · 2024-06-20T05:34:03Z

I want to use CUDA instead of CPU to increase the speed on tag inference.

My machine Ubuntu 22.04.3 LTS (GNU/Linux 6.5.0-35-generic x86_64), CUDA 12.2

I learned from https://onnxruntime.ai/docs/install/ that if you have cuda 12 must install using pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/ as of time of writing, instead of simply pip install onnxruntime-gpu which is for cuda 11. This took me a while to figure out. Kept getting errors that didn't make sense:

[E:onnxruntime
, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

[W:onnxruntime, onnxruntime_pybind_state.cc:870 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.

I had those objects. but after reading carefully and reinstalling based on the above for cuda 12 it worked. Using CUDAExecutionprovider instead of CPUExecutionprovider however did cause a new warning:

[W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.

Basically bottlenecked by CPU/GPU data transfer. Trying to figure out but have not been able to successfully.

The text was updated successfully, but these errors were encountered:

andrewtvuong · 2024-06-20T05:39:12Z

Opened this pull to show what I tried #63

stanleyftf1005 · 2024-06-20T11:41:52Z

any luck fixing? having the same issue. WD14 tagger been a pain to run

andrewtvuong · 2024-06-20T15:35:41Z

I just started looking into this yesterday, will try to fix it when I have more time. Just starting discussions here in case I miss something.

KurtCocain · 2024-07-17T02:08:05Z

was working fine and then it started doing this between 2 generation, no idea what i did. Anyone fixed it yet?

sddiky · 2024-08-16T23:38:21Z

When I have to switch to CUDAExecutionProvider (because other custom node requiring onnxruntime-gpu), the speed of WD14 tagging became very slow.
As long as I can't uninstall onnxruntime-gpu, I just modify ortProviders in pysssss.json from ["CUDAExecutionProvider","CPUExecutionProvider"] to ["CPUExecutionProvider","CUDAExecutionProvider"], and the problem solved.
Hope this helps.

smallersoup · 2024-08-22T09:39:01Z

When I have to switch to CUDAExecutionProvider (because other custom node requiring onnxruntime-gpu), the speed of WD14 tagging became very slow. As long as I can't uninstall onnxruntime-gpu, I just modify ortProviders in pysssss.json from ["CUDAExecutionProvider","CPUExecutionProvider"] to ["CPUExecutionProvider","CUDAExecutionProvider"], and the problem solved. Hope this helps.

I'm extremely grateful, the solution you provided has helped me a lot.

ctrlz526 · 2024-09-20T14:17:18Z

May I ask if your problem has been solved? I have encountered this issue as well

andrewtvuong mentioned this issue Jun 20, 2024

Fixing warning after upgrading tagger to use CUDAExecutionprovider instead of CPUExecutionprovider in attempt to improve performance speed. #63

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

andrewtvuong commented Jun 20, 2024 •

edited

Loading

andrewtvuong commented Jun 20, 2024

stanleyftf1005 commented Jun 20, 2024

andrewtvuong commented Jun 20, 2024

KurtCocain commented Jul 17, 2024

sddiky commented Aug 16, 2024

smallersoup commented Aug 22, 2024

ctrlz526 commented Sep 20, 2024

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

Comments

andrewtvuong commented Jun 20, 2024 • edited Loading

andrewtvuong commented Jun 20, 2024

stanleyftf1005 commented Jun 20, 2024

andrewtvuong commented Jun 20, 2024

KurtCocain commented Jul 17, 2024

sddiky commented Aug 16, 2024

smallersoup commented Aug 22, 2024

ctrlz526 commented Sep 20, 2024

andrewtvuong commented Jun 20, 2024 •

edited

Loading