error msg is "CUDA: NVIDIA driver is installed, but CUDA runtime is not" - yet OLLAMA SERVE runs fine with only driver #419
-
OLLAMA SERVE runs fine without CUDA installed (only the driver) but node-llama-cpp gets an error. See messages from both below. npx node-llama-cpp inspect gpu CUDA: NVIDIA driver is installed, but CUDA runtime is not Vulkan device: Quadro T1000 CPU model: Intel(R) Xeon(R) W-10885M CPU @ 2.40GHz from OLLAMA SERVE |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@amcintyre99 Can you please run these commands and share their results with me? It'll help me investigate this issue. cat /etc/os-release
find /usr/local/cuda* /usr/lib* -name "libnvidia*.so*"
find /usr/local/cuda* /usr/lib* -name "libcuda*.so*"
find /usr/local/cuda* /usr/lib* -name "libcublas*.so*" Run this command inside of your project where ldd ./node_modules/@node-llama-cpp/linux-x64-cuda/bins/linux-x64-cuda/libggml-cuda.so Also, please try to run inference forcibly with both CUDA and Vulkan and let me know whether each of them worked for you: npx -y node-llama-cpp chat --prompt 'Hi there!' --gpu cuda "hf:mradermacher/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct.Q4_K_M.gguf"
npx -y node-llama-cpp chat --prompt 'Hi there!' --gpu vulkan "hf:mradermacher/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct.Q4_K_M.gguf" |
Beta Was this translation helpful? Give feedback.
-
Easier to use a gist for this: I never compile the models, I just run them. Since OLLAMA runs them fine, on this 4gb gpu, and a larger 11gb gpu on another pc, I didn't worry about the toolkit. And I assumed I would use your package the same way, just to run a downloaded model. |
Beta Was this translation helpful? Give feedback.
Thanks for helping me investigate this.
It appears that the Ollama installation comes bundled with 2 versions of CUDA libraries, so while you don't have to install it yourself, it does get the required CUDA files onto your machine, but only for Ollama to use.
However, it also means that it doesn't fully utilize all of your hardware capabilities, since a full dedicated CUDA installation can utilize more microarchitecture features available on your specific GPU.
From my tests, the Vulkan support is as performant as the CUDA support (in some cases it was even slightly faster), so when a full CUDA installation isn't available, Vulkan is a good alternative.
Vulkan is always used by default as …