You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, with latest intelanalytics/ipex-llm-inference-cpp-xpu:latest image I got this error :
UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl/ggml-sycl.cpp, line:2819
I use this commad line : ./llama-server -m Mistral-Small-24B-Instruct-2501-IQ4_XS.gguf -c 2048 -ngl 99 --temp 0 --port 1234 --host 192.168.1.64
I am also experiencing this UR backend returns:40 error specifically with nomic_embed_text. Right now, my workaround for this is to use this command
set OLLAMA_NUM_GPU=11
(note: nomic_embed_text has a total of 13 layers)
Any value over this will result is the error but using this command will force CPU to do some work and not just the GPU.
Oddly enough, setting this parameter to 999, as instructed in the docs, with any other model will work just fine.
@qiuxin2012 I'm not sure it's memory related.
It's a 24B model but quantized : model size = 11820.33 MiB + context 1064.00 MiB . Should be ok for 16GB RAM.
And it worked with older version.
After @san-nos comment, I tried some options and with --batch-size <= 1024 it works .
Is the 2048 default logical maximum batch size the problem ?
Hi, with latest intelanalytics/ipex-llm-inference-cpp-xpu:latest image I got this error :
UR backend failed. UR backend returns:40 (UR_RESULT_ERROR_OUT_OF_RESOURCES)Exception caught at file:/home/runner/_work/llm.cpp/llm.cpp/llama-cpp-bigdl/ggml/src/ggml-sycl/ggml-sycl.cpp, line:2819
I use this commad line : ./llama-server -m Mistral-Small-24B-Instruct-2501-IQ4_XS.gguf -c 2048 -ngl 99 --temp 0 --port 1234 --host 192.168.1.64
Full output :
The text was updated successfully, but these errors were encountered: