Dynamic CUDA driver loader #1841

didzis · 2024-02-07T15:35:30Z

This PR implements optional dynamic CUDA driver loader and static linking against CUDA runtime.

As a result CUDA enabled binaries can run without recompilation on systems with or without CUDA supported GPUs (and CUDA driver) with fallback to alternative computation methods.

jettoblack · 2024-02-09T23:38:21Z

Hi @didzis this is a great idea and I'm giving it a try now on Windows but without success. I built the Windows DLL with -DWHISPER_CUBLAS=1 -DWHISPER_DYNAMIC_CUDA=1 using CUDA 11.8. On a system with CUDA installed it works, but on a system without CUDA it fails to load the DLL due to a missing dependency on nvcuda.dll (which it is not a redistributable file). This is normally installed by CUDA in c:\windows\system32. I couldn't find where in the makefile this is getting linked to whisper.dll though. Any suggestions or maybe something I did wrong? Thanks!

didzis · 2024-02-10T08:47:33Z

Hi, this was implemented only for non-Windows systems, but I made an attempt to support Windows platform in the latest commit. I don't have any means to test it myself. You may need to change the driver DLL name. Note that, there is a comment stating that there is no static cuBLAS library available since CUDA Toolkit 12.3.1 and thus static linking for cuBLAS is disabled. If this approach works for you, then some version check for older CUDA Toolkits may solve this. It should work as is with the dynamic cuBALS library, just that dynamic linking against any CUDA library defeats the purpose of all this.

didzis · 2024-02-10T09:09:56Z

@ggerganov, here it is possible to embed the contents of cuda-loader.c into ggml-cuda.cu - tested, it works.

slaren · 2024-02-11T12:42:45Z

My goal in the long term to address this is to move the backends to dynamic libraries loadable at run time, then we could use a single build for all the backends. I don't think this is going to work on Windows for the reasons already mentioned, some CUDA libraries do not have static versions in Windows, so the executable will depend on the CUDA dlls regardless.

ggerganov · 2024-02-11T15:42:36Z

Ok, to me it seems better to aim for the more general solution and for now not merge this change.

didzis · 2024-02-12T13:22:35Z

I didn't want to step into Windows realm with this PR as it was intended a Linux only feature. Thus I reverted this PR to Linux only solution.

Also I checked multiple CUDA Toolkit Windows releases and unfortunately it is the case mentioned before - cuBLAS static libraries are missing from Windows release.

The general solution mentioned above is great, however there are some disadvantages with it:

for CUDA it still requires the CUDA Toolkit (a compatible version) to be installed on the target machine;
the dynamic libraries must be distributed along with the binary; it's especially noticeable in cases where static libwhisper is embedded into a binary of another application having it's own distribution requirements;
the feature is not ready yet, still work to be done.

With this PR the CUDA code is made optional by dynamically load only the libcuda.so (if present), which is part of NVIDIA kernel drivers package, thus no CUDA Toolkit is required and there is no interference with possibly any already installed incompatible tooklit version. Some minimum libcuda.so driver version is required, but that depends on the version of the CUDA Toolkit static libraries used for linking the application. To maximize the coverage, an older CUDA Toolkit can be used.

A quote from the NVIDIA documentation here:

Note that in the latter case, the library cuda is not needed. The CUDA Runtime will try to open explicitly the cuda library if needed. In the case of a system which does not have the CUDA driver installed, this allows the application to gracefully manage this issue and potentially run if a CPU-only path is available.

The static cuBLAS library itself does the same - loads libcuda.so dynamically if needed and available.

Although there are no native cuBLAS static library for Windows available, CUDA can be used with Windows Subsystem for Linux 2 which is a Linux system and this PR still applies out-of-box:

The latest NVIDIA Windows GPU Driver will fully support WSL 2. With CUDA support in the driver, existing applications (compiled elsewhere on a Linux system for the same target GPU) can run unmodified within the WSL environment.
...
Once a Windows NVIDIA GPU driver is installed on the system, CUDA becomes available within WSL 2. The CUDA driver installed on Windows host will be stubbed inside the WSL 2 as libcuda.so, therefore users must not install any NVIDIA GPU Linux driver within WSL 2.

I understand that there are no other options left for native Windows applications, but I fail to see any reason not to have both approaches supported for Linux platform (or WSL 2 on Windows).

@ggerganov I believe it's worth to still consider merging this optional (and small) feature in one form or another (i.e., the solution can also be merged into ggml-cuda.cu). What do you think given the above?

…ntime This approach lets CUDA enabled binaries to run on systems without CUDA supported GPUs and fall back to alternative computation methods.

didzis force-pushed the dynamic-cuda branch from 8be9f23 to f10a7b4 Compare February 12, 2024 12:53

didzis force-pushed the dynamic-cuda branch 2 times, most recently from 4db8d4e to 516a409 Compare April 16, 2024 05:42

didzis force-pushed the dynamic-cuda branch from 516a409 to 8fd7381 Compare April 25, 2024 21:02

ggml : add dynamic CUDA driver loader and static link against CUDA ru…

9e12faf

…ntime This approach lets CUDA enabled binaries to run on systems without CUDA supported GPUs and fall back to alternative computation methods.

didzis force-pushed the dynamic-cuda branch from 8fd7381 to 9e12faf Compare May 6, 2024 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic CUDA driver loader #1841

Dynamic CUDA driver loader #1841

didzis commented Feb 7, 2024 •

edited

jettoblack commented Feb 9, 2024

didzis commented Feb 10, 2024

didzis commented Feb 10, 2024

slaren commented Feb 11, 2024

ggerganov commented Feb 11, 2024

didzis commented Feb 12, 2024

Dynamic CUDA driver loader #1841

Are you sure you want to change the base?

Dynamic CUDA driver loader #1841

Conversation

didzis commented Feb 7, 2024 • edited

jettoblack commented Feb 9, 2024

didzis commented Feb 10, 2024

didzis commented Feb 10, 2024

slaren commented Feb 11, 2024

ggerganov commented Feb 11, 2024

didzis commented Feb 12, 2024

didzis commented Feb 7, 2024 •

edited