Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic CUDA driver loader #1841

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

didzis
Copy link
Contributor

@didzis didzis commented Feb 7, 2024

This PR implements optional dynamic CUDA driver loader and static linking against CUDA runtime.

As a result CUDA enabled binaries can run without recompilation on systems with or without CUDA supported GPUs (and CUDA driver) with fallback to alternative computation methods.

@jettoblack
Copy link

Hi @didzis this is a great idea and I'm giving it a try now on Windows but without success. I built the Windows DLL with -DWHISPER_CUBLAS=1 -DWHISPER_DYNAMIC_CUDA=1 using CUDA 11.8. On a system with CUDA installed it works, but on a system without CUDA it fails to load the DLL due to a missing dependency on nvcuda.dll (which it is not a redistributable file). This is normally installed by CUDA in c:\windows\system32. I couldn't find where in the makefile this is getting linked to whisper.dll though. Any suggestions or maybe something I did wrong? Thanks!

@didzis
Copy link
Contributor Author

didzis commented Feb 10, 2024

Hi, this was implemented only for non-Windows systems, but I made an attempt to support Windows platform in the latest commit. I don't have any means to test it myself. You may need to change the driver DLL name. Note that, there is a comment stating that there is no static cuBLAS library available since CUDA Toolkit 12.3.1 and thus static linking for cuBLAS is disabled. If this approach works for you, then some version check for older CUDA Toolkits may solve this. It should work as is with the dynamic cuBALS library, just that dynamic linking against any CUDA library defeats the purpose of all this.

@didzis
Copy link
Contributor Author

didzis commented Feb 10, 2024

@ggerganov, here it is possible to embed the contents of cuda-loader.c into ggml-cuda.cu - tested, it works.

@slaren
Copy link
Collaborator

slaren commented Feb 11, 2024

My goal in the long term to address this is to move the backends to dynamic libraries loadable at run time, then we could use a single build for all the backends. I don't think this is going to work on Windows for the reasons already mentioned, some CUDA libraries do not have static versions in Windows, so the executable will depend on the CUDA dlls regardless.

@ggerganov
Copy link
Owner

Ok, to me it seems better to aim for the more general solution and for now not merge this change.

@didzis
Copy link
Contributor Author

didzis commented Feb 12, 2024

I didn't want to step into Windows realm with this PR as it was intended a Linux only feature. Thus I reverted this PR to Linux only solution.

Also I checked multiple CUDA Toolkit Windows releases and unfortunately it is the case mentioned before - cuBLAS static libraries are missing from Windows release.

The general solution mentioned above is great, however there are some disadvantages with it:

  • for CUDA it still requires the CUDA Toolkit (a compatible version) to be installed on the target machine;
  • the dynamic libraries must be distributed along with the binary; it's especially noticeable in cases where static libwhisper is embedded into a binary of another application having it's own distribution requirements;
  • the feature is not ready yet, still work to be done.

With this PR the CUDA code is made optional by dynamically load only the libcuda.so (if present), which is part of NVIDIA kernel drivers package, thus no CUDA Toolkit is required and there is no interference with possibly any already installed incompatible tooklit version. Some minimum libcuda.so driver version is required, but that depends on the version of the CUDA Toolkit static libraries used for linking the application. To maximize the coverage, an older CUDA Toolkit can be used.

A quote from the NVIDIA documentation here:

Note that in the latter case, the library cuda is not needed. The CUDA Runtime will try to open explicitly the cuda library if needed. In the case of a system which does not have the CUDA driver installed, this allows the application to gracefully manage this issue and potentially run if a CPU-only path is available.

The static cuBLAS library itself does the same - loads libcuda.so dynamically if needed and available.

Although there are no native cuBLAS static library for Windows available, CUDA can be used with Windows Subsystem for Linux 2 which is a Linux system and this PR still applies out-of-box:

The latest NVIDIA Windows GPU Driver will fully support WSL 2. With CUDA support in the driver, existing applications (compiled elsewhere on a Linux system for the same target GPU) can run unmodified within the WSL environment.
...
Once a Windows NVIDIA GPU driver is installed on the system, CUDA becomes available within WSL 2. The CUDA driver installed on Windows host will be stubbed inside the WSL 2 as libcuda.so, therefore users must not install any NVIDIA GPU Linux driver within WSL 2.

I understand that there are no other options left for native Windows applications, but I fail to see any reason not to have both approaches supported for Linux platform (or WSL 2 on Windows).

@ggerganov I believe it's worth to still consider merging this optional (and small) feature in one form or another (i.e., the solution can also be merged into ggml-cuda.cu). What do you think given the above?

…ntime

This approach lets CUDA enabled binaries to run on systems without CUDA
supported GPUs and fall back to alternative computation methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants