Skip to content

Khoj does not work because of "cudaMalloc failed: out of memory" on Debian 12.11 with CUDA 12.9 #1215

@N0rbert

Description

@N0rbert

Server

  • Cloud (https://app.khoj.dev)
  • Self-Hosted Docker
  • Self-Hosted Python package
  • Self-Hosted source code

Clients

  • Web browser
  • Desktop/mobile app
  • Obsidian
  • Emacs
  • WhatsApp

OS

  • Windows
  • macOS
  • Linux
  • Android
  • iOS

Khoj version

1.42.10

Describe the bug

I'm using Debian 12.11 with CUDA 12.9.
All other local GPU-based LLM solutions like LocalAI and Open-WebUI work normally here, LocalScore benchmark also runs.
The GPU is NVIDIA RTX 2000 Ada Generation with 16 Gb of VRAM.

I have installed Khoj using below commands:

cd ~/Software
mkdir khoj
cd khoj
python3 -m venv .venv
source .venv/bin/activate

export CUDACXX=/usr/local/cuda-12.9/bin/nvcc
export CUDA_PATH=/usr/local/cuda-12.9
export CUDAToolkit_ROOT=/usr/local/cuda-12.9/

CMAKE_ARGS="-DGGML_CUDA=on -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.9/targets/x86_64-linux/include" FORCE_CMAKE=1 python -m pip install 'khoj[local]'

Then I ran it with:

cd ~/Software/khoj
source .venv/bin/activate

USE_EMBEDDED_DB="true" KHOJ_TELEMETRY_DISABLE="true" khoj --anonymous-mode

Current Behavior

The most interesting lines from console log:

[10:15:08.169977] INFO     khoj: 🌘 Starting Khoj                                                                                     main.py:147
[10:15:08.175867] INFO     khoj: 🔒 Schedule Leader elected                                                                           main.py:171
[10:15:08.182735] INFO     khoj: Started Background Scheduler                                                                         main.py:181
[10:15:08.521974] INFO     khoj.configure: Initializing with default config.                                                     configure.py:232
[10:15:11.747732] INFO     khoj.configure: 📡 Telemetry disabled                                                                 configure.py:280
[10:15:11.748468] INFO     khoj: 🌖 Khoj is ready to engage                                                                           main.py:220
[10:15:11.756719] INFO     uvicorn.error: Started server process [19450]                                                             server.py:82
[10:15:11.757779] INFO     uvicorn.error: Waiting for application startup.                                                               on.py:48
[10:15:11.758388] INFO     uvicorn.error: Application startup complete.                                                                  on.py:62
[10:15:11.758986] INFO     uvicorn.error: Uvicorn running on http://127.0.0.1:42110 (Press CTRL+C to quit)                          server.py:214
[10:15:50.291325] INFO     uvicorn.access: 127.0.0.1:44766 - "GET / HTTP/1.1" 200                                                 h11_impl.py:476
...
[10:15:57.629975] INFO     uvicorn.access: 127.0.0.1:44780 - "GET /chat?conversationId=e2fd08ad-c4d9-4049-9d0a-7a389b9a45aa       h11_impl.py:476
                           HTTP/1.1" 200                                                                                                      
...
[10:15:57.966474] INFO     uvicorn.access: 127.0.0.1:44766 - "GET /api/chat/sessions HTTP/1.1" 200                                h11_impl.py:476
[10:15:57.973602] INFO     khoj.routers.helpers: Loading Offline Chat Model...                                                     helpers.py:173
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16384,00 MiB on device 0: cudaMalloc failed: out of memory
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
[10:16:04.482772] INFO     uvicorn.access: 127.0.0.1:44784 - "GET /api/settings?detailed=true HTTP/1.1" 200                       h11_impl.py:476
[10:16:04.489976] ERROR    uvicorn.error: Exception in ASGI application                                                           h11_impl.py:411

The full log is attached.

khoj-error.log

Expected Behavior

Khoj is working normally.

Reproduction Steps

  1. Have Debian 12.11 installed with default Python 3.11.2 from the official repositories.
  2. Have CUDA 12.9 installed using Nvidia local repository.
  3. Install Khoj as described in this isssue.
  4. Open web-browser at http://localhost:42110/ and click Show all, Create Image, write Paint a picture of cat and hit Enter.

Possible Workaround

No response

Additional Information

$ nvidia-smi 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    On  |   00000000:01:00.0  On |                  Off |
| 30%   38C    P8              9W /   70W |     418MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2404      G   /usr/lib/xorg/Xorg                      372MiB |
+-----------------------------------------------------------------------------------------+

Link to Discord or Github discussion

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixFix something that isn't working as expectedquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions