-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Labels
fixFix something that isn't working as expectedFix something that isn't working as expectedquestionFurther information is requestedFurther information is requested
Description
Server
- Cloud (https://app.khoj.dev)
- Self-Hosted Docker
- Self-Hosted Python package
- Self-Hosted source code
Clients
- Web browser
- Desktop/mobile app
- Obsidian
- Emacs
OS
- Windows
- macOS
- Linux
- Android
- iOS
Khoj version
1.42.10
Describe the bug
I'm using Debian 12.11 with CUDA 12.9.
All other local GPU-based LLM solutions like LocalAI and Open-WebUI work normally here, LocalScore benchmark also runs.
The GPU is NVIDIA RTX 2000 Ada Generation with 16 Gb of VRAM.
I have installed Khoj using below commands:
cd ~/Software
mkdir khoj
cd khoj
python3 -m venv .venv
source .venv/bin/activate
export CUDACXX=/usr/local/cuda-12.9/bin/nvcc
export CUDA_PATH=/usr/local/cuda-12.9
export CUDAToolkit_ROOT=/usr/local/cuda-12.9/
CMAKE_ARGS="-DGGML_CUDA=on -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12.9/targets/x86_64-linux/include" FORCE_CMAKE=1 python -m pip install 'khoj[local]'
Then I ran it with:
cd ~/Software/khoj
source .venv/bin/activate
USE_EMBEDDED_DB="true" KHOJ_TELEMETRY_DISABLE="true" khoj --anonymous-mode
Current Behavior
The most interesting lines from console log:
[10:15:08.169977] INFO khoj: 🌘 Starting Khoj main.py:147
[10:15:08.175867] INFO khoj: 🔒 Schedule Leader elected main.py:171
[10:15:08.182735] INFO khoj: Started Background Scheduler main.py:181
[10:15:08.521974] INFO khoj.configure: Initializing with default config. configure.py:232
[10:15:11.747732] INFO khoj.configure: 📡 Telemetry disabled configure.py:280
[10:15:11.748468] INFO khoj: 🌖 Khoj is ready to engage main.py:220
[10:15:11.756719] INFO uvicorn.error: Started server process [19450] server.py:82
[10:15:11.757779] INFO uvicorn.error: Waiting for application startup. on.py:48
[10:15:11.758388] INFO uvicorn.error: Application startup complete. on.py:62
[10:15:11.758986] INFO uvicorn.error: Uvicorn running on http://127.0.0.1:42110 (Press CTRL+C to quit) server.py:214
[10:15:50.291325] INFO uvicorn.access: 127.0.0.1:44766 - "GET / HTTP/1.1" 200 h11_impl.py:476
...
[10:15:57.629975] INFO uvicorn.access: 127.0.0.1:44780 - "GET /chat?conversationId=e2fd08ad-c4d9-4049-9d0a-7a389b9a45aa h11_impl.py:476
HTTP/1.1" 200
...
[10:15:57.966474] INFO uvicorn.access: 127.0.0.1:44766 - "GET /api/chat/sessions HTTP/1.1" 200 h11_impl.py:476
[10:15:57.973602] INFO khoj.routers.helpers: Loading Offline Chat Model... helpers.py:173
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16384,00 MiB on device 0: cudaMalloc failed: out of memory
llama_kv_cache_init: failed to allocate buffer for kv cache
llama_new_context_with_model: llama_kv_cache_init() failed for self-attention cache
[10:16:04.482772] INFO uvicorn.access: 127.0.0.1:44784 - "GET /api/settings?detailed=true HTTP/1.1" 200 h11_impl.py:476
[10:16:04.489976] ERROR uvicorn.error: Exception in ASGI application h11_impl.py:411
The full log is attached.
Expected Behavior
Khoj is working normally.
Reproduction Steps
- Have Debian 12.11 installed with default Python 3.11.2 from the official repositories.
- Have CUDA 12.9 installed using Nvidia local repository.
- Install Khoj as described in this isssue.
- Open web-browser at http://localhost:42110/ and click Show all, Create Image, write
Paint a picture of cat
and hit Enter.
Possible Workaround
No response
Additional Information
$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX 2000 Ada Gene... On | 00000000:01:00.0 On | Off |
| 30% 38C P8 9W / 70W | 418MiB / 16380MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2404 G /usr/lib/xorg/Xorg 372MiB |
+-----------------------------------------------------------------------------------------+
Link to Discord or Github discussion
No response
Metadata
Metadata
Assignees
Labels
fixFix something that isn't working as expectedFix something that isn't working as expectedquestionFurther information is requestedFurther information is requested