Khoj does not work because of "cudaMalloc failed: out of memory" on Debian 12.11 with CUDA 12.9

### Server

- [ ] Cloud (https://app.khoj.dev)
- [ ] Self-Hosted Docker
- [x] Self-Hosted Python package
- [ ] Self-Hosted source code

### Clients

- [x] Web browser
- [ ] Desktop/mobile app
- [ ] Obsidian
- [ ] Emacs
- [ ] WhatsApp

### OS

- [ ] Windows
- [ ] macOS
- [x] Linux
- [ ] Android
- [ ] iOS

### Khoj version

1.42.10

### Describe the bug

I'm using Debian 12.11 with CUDA 12.9. 
All other local GPU-based LLM solutions like LocalAI and Open-WebUI work normally here, LocalScore benchmark also runs. 
The GPU is NVIDIA RTX 2000 Ada Generation with 16 Gb of VRAM.

I have installed Khoj using below commands:

```
cd ~/Software
mkdir khoj
cd khoj
python3 -m venv .venv
source .venv/bin/activate

export CUDACXX=/usr/local/cuda-12.9/bin/nvcc
export CUDA_PATH=/usr/local/cuda-12.9
export CUDAToolkit_ROOT=/usr/local/cuda-12.9/

CMAKE_ARGS="-DGGML_CUDA=on -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.9/targets/x86_64-linux/include" FORCE_CMAKE=1 python -m pip install 'khoj[local]'
```

Then I ran it with:

```
cd ~/Software/khoj
source .venv/bin/activate

USE_EMBEDDED_DB="true" KHOJ_TELEMETRY_DISABLE="true" khoj --anonymous-mode
```

### Current Behavior

The most interesting lines from console log:

```
[10:15:08.169977] INFO     khoj: 🌘 Starting Khoj                                                                                     main.py:147
[10:15:08.175867] INFO     khoj: 🔒 Schedule Leader elected                                                                           main.py:171
[10:15:08.182735] INFO     khoj: Started Background Scheduler                                                                         main.py:181
[10:15:08.521974] INFO     khoj.configure: Initializing with default config.                                                     configure.py:232
[10:15:11.747732] INFO     khoj.configure: 📡 Telemetry disabled                                                                 configure.py:280
[10:15:11.748468] INFO     khoj: 🌖 Khoj is ready to engage                                                                           main.py:220
[10:15:11.756719] INFO     uvicorn.error: Started server process [19450]                                                             server.py:82
[10:15:11.757779] INFO     uvicorn.error: Waiting for application startup.                                                               on.py:48
[10:15:11.758388] INFO     uvicorn.error: Application startup complete.                                                                  on.py:62
[10:15:11.758986] INFO     uvicorn.error: Uvicorn running on http://127.0.0.1:42110 (Press CTRL+C to quit)                          server.py:214
[10:15:50.291325] INFO     uvicorn.access: 127.0.0.1:44766 - "GET / HTTP/1.1" 200                                                 h11_impl.py:476
...
[10:15:57.629975] INFO     uvicorn.access: 127.0.0.1:44780 - "GET /chat?conversationId=e2fd08ad-c4d9-4049-9d0a-7a389b9a45aa       h11_impl.py:476
                           HTTP/1.1" 200                                                                                                      
...
[10:15:57.966474] INFO     uvicorn.access: 127.0.0.1:44766 - "GET /api/chat/sessions HTTP/1.1" 200                                h11_impl.py:476
[10:15:57.973602] INFO     khoj.routers.helpers: Loading Offline Chat Model...                                                     helpers.py:173
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16384,00 MiB on device 0: cudaMalloc failed: out of memory
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
[10:16:04.482772] INFO     uvicorn.access: 127.0.0.1:44784 - "GET /api/settings?detailed=true HTTP/1.1" 200                       h11_impl.py:476
[10:16:04.489976] ERROR    uvicorn.error: Exception in ASGI application                                                           h11_impl.py:411
```

The full log is attached.

[khoj-error.log](https://github.com/user-attachments/files/21811762/khoj-error.log)

### Expected Behavior

Khoj is working normally.

### Reproduction Steps

1. Have Debian 12.11 installed with default Python 3.11.2 from the official repositories. 
2. Have CUDA 12.9 installed using Nvidia local repository.
3. Install Khoj as described in this isssue.
4. Open web-browser at http://localhost:42110/ and click *Show all*,  *Create Image*, write `Paint a picture of cat` and hit Enter.

### Possible Workaround

_No response_

### Additional Information

```
$ nvidia-smi 
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 2000 Ada Gene...    On  |   00000000:01:00.0  On |                  Off |
| 30%   38C    P8              9W /   70W |     418MiB /  16380MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2404      G   /usr/lib/xorg/Xorg                      372MiB |
+-----------------------------------------------------------------------------------------+

```


### Link to Discord or Github discussion

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Khoj does not work because of "cudaMalloc failed: out of memory" on Debian 12.11 with CUDA 12.9 #1215

Server

Clients

OS

Khoj version

Describe the bug

Current Behavior

Expected Behavior

Reproduction Steps

Possible Workaround

Additional Information

Link to Discord or Github discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Khoj does not work because of "cudaMalloc failed: out of memory" on Debian 12.11 with CUDA 12.9 #1215

Description

Server

Clients

OS

Khoj version

Describe the bug

Current Behavior

Expected Behavior

Reproduction Steps

Possible Workaround

Additional Information

Link to Discord or Github discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions