error msg is "CUDA: NVIDIA driver is installed, but CUDA runtime is not" - yet OLLAMA SERVE runs fine with only driver #419

amcintyre99 · 2025-01-25T17:34:57Z

amcintyre99
Jan 25, 2025

OLLAMA SERVE runs fine without CUDA installed (only the driver) but node-llama-cpp gets an error. See messages from both below.

npx node-llama-cpp inspect gpu
OS: Ubuntu 24.04.1 LTS (x64)
Node: 18.20.5 (x64)
node-llama-cpp: 3.4.1

CUDA: NVIDIA driver is installed, but CUDA runtime is not
To resolve errors related to CUDA, see the CUDA guide: https://node-llama-cpp.withcat.ai/guide/CUDA
Vulkan: available

Vulkan device: Quadro T1000
Vulkan used VRAM: 0% (0B/4GB)
Vulkan free VRAM: 100% (4GB/4GB)

CPU model: Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz
Math cores: 8
Used RAM: 2.53% (1.58GB/62.5GB)
Free RAM: 97.46% (60.92GB/62.5GB)
Used swap: 0% (0B/2GB)
Max swap size: 2GB
mmap: supported

from OLLAMA SERVE
time=2025-01-25T12:30:41.801-05:00 level=INFO source=routes.go:1238 msg="Listening on 127.0.0.1:11434 (version 0.5.7)"
time=2025-01-25T12:30:41.803-05:00 level=INFO source=routes.go:1267 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 cuda_v11_avx cuda_v12_avx rocm_avx]"
time=2025-01-25T12:30:41.803-05:00 level=INFO source=gpu.go:226 msg="looking for compatible GPUs"
time=2025-01-25T12:30:42.071-05:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-762e18a3-a4ad-815b-6236-53060e50229b library=cuda variant=v12 compute=7.5 driver=12.7 name="Quadro T1000" total="3.6 GiB" available="3.4 GiB"

Answered by giladgd

Jan 25, 2025

Thanks for helping me investigate this.

It appears that the Ollama installation comes bundled with 2 versions of CUDA libraries, so while you don't have to install it yourself, it does get the required CUDA files onto your machine, but only for Ollama to use.
However, it also means that it doesn't fully utilize all of your hardware capabilities, since a full dedicated CUDA installation can utilize more microarchitecture features available on your specific GPU.

From my tests, the Vulkan support is as performant as the CUDA support (in some cases it was even slightly faster), so when a full CUDA installation isn't available, Vulkan is a good alternative.
Vulkan is always used by default as …

View full answer

giladgd · 2025-01-25T19:46:34Z

giladgd
Jan 25, 2025
Maintainer

@amcintyre99 Can you please run these commands and share their results with me? It'll help me investigate this issue.

cat /etc/os-release
find /usr/local/cuda* /usr/lib* -name "libnvidia*.so*"
find /usr/local/cuda* /usr/lib* -name "libcuda*.so*"
find /usr/local/cuda* /usr/lib* -name "libcublas*.so*"

Run this command inside of your project where node-llama-cpp is installed:

ldd ./node_modules/@node-llama-cpp/linux-x64-cuda/bins/linux-x64-cuda/libggml-cuda.so

Also, please try to run inference forcibly with both CUDA and Vulkan and let me know whether each of them worked for you:

npx -y node-llama-cpp chat --prompt 'Hi there!' --gpu cuda "hf:mradermacher/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct.Q4_K_M.gguf"

npx -y node-llama-cpp chat --prompt 'Hi there!' --gpu vulkan "hf:mradermacher/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct.Q4_K_M.gguf"

0 replies

amcintyre99 · 2025-01-25T21:02:43Z

amcintyre99
Jan 25, 2025
Author

Easier to use a gist for this:
https://gist.github.com/amcintyre99/6813f35db36f525d5e9803a076545fc4

I never compile the models, I just run them. Since OLLAMA runs them fine, on this 4gb gpu, and a larger 11gb gpu on another pc, I didn't worry about the toolkit.

And I assumed I would use your package the same way, just to run a downloaded model.

1 reply

giladgd Jan 25, 2025
Maintainer

Thanks for helping me investigate this.

It appears that the Ollama installation comes bundled with 2 versions of CUDA libraries, so while you don't have to install it yourself, it does get the required CUDA files onto your machine, but only for Ollama to use.
However, it also means that it doesn't fully utilize all of your hardware capabilities, since a full dedicated CUDA installation can utilize more microarchitecture features available on your specific GPU.

From my tests, the Vulkan support is as performant as the CUDA support (in some cases it was even slightly faster), so when a full CUDA installation isn't available, Vulkan is a good alternative.
Vulkan is always used by default as a fallback when CUDA isn't available.

The Ollama installation size is 1.6GB while the CUDA binaries of node-llama-cpp are just ~350MB, bundling the required CUDA libraries into node-llama-cpp will add another 900MB, which would make npm install extremely slow, so since using CUDA this way isn't superior to just using Vulkan, I'll skip it (at least for now).
The only downside to using Vulkan at the moment is that it doesn't work well with using multiple contexts at the same time (you can use multiple sequences on a single context instead), but this limitation is expected to be resolved soon (on llama.cpp).

Answer selected by giladgd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error msg is "CUDA: NVIDIA driver is installed, but CUDA runtime is not" - yet OLLAMA SERVE runs fine with only driver #419

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

error msg is "CUDA: NVIDIA driver is installed, but CUDA runtime is not" - yet OLLAMA SERVE runs fine with only driver #419

amcintyre99 Jan 25, 2025

Replies: 2 comments · 1 reply

giladgd Jan 25, 2025 Maintainer

amcintyre99 Jan 25, 2025 Author

giladgd Jan 25, 2025 Maintainer

amcintyre99
Jan 25, 2025

Replies: 2 comments 1 reply

giladgd
Jan 25, 2025
Maintainer

amcintyre99
Jan 25, 2025
Author

giladgd Jan 25, 2025
Maintainer