-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a plan to support NPU? #30
Comments
same question |
希望可以支持Snapdragon X Elite的NPU |
Bump. What's the plan to support Snapdragon's X Elite NPU in the Technical Preview? |
Pumping :-) |
Up |
Hello, any update on this ? |
From what I read, we basically have to wait until NPU support is added to llamacpp since it's what lmstudio uses for inference. |
Any update on the NPU? |
when will this be released? https://www.tiktok.com/@shanselman/video/7439403306352528671 |
bump |
up |
daily bump. |
I also would love it |
Now that 3.6 is out, it would be really cool for a beta update for NPU. I'd like my Ultra 155H NPU not to go to waste. |
seems like, very soon we are going to get build which run can run lmstudio on npu. https://www.linkedin.com/posts/shanselman_wow-check-this-out-its-lmstudioai-running-activity-7265044845801418752-Cc1p?utm_source=share&utm_medium=member_android However i have tried running the llm on npu with AnythingLlM doesn't have big jump in compare to cpu |
Running on the NPU is probably more efficient than on the CPU tho |
+1 Need support to Intel NPU ヘ( ̄ω ̄ヘ) |
AFAIK only AnythingLLM is able to use the NPU for running a model. There are just two Llama 8B models to choose, when running through the NPU. |
Microsoft used ONNX QDQ format to make LLMs running on NPU, with AI Toolkit VS Code extension.
|
If you download AI toolkit in Visual Studio Code you can run DeepSeek r1
using the NPU locally
…On Fri, Feb 14, 2025 at 1:46 PM ofoacimr ***@***.***> wrote:
Microsoft used ONNX QDQ format to make LLMs running on NPU, with AI
Toolkit VS Code extension.
https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/
While the Qwen 1.5B release from DeepSeek does have an int4 variant, it
does not directly map to the NPU due to presence of dynamic input shapes
and behavior – all of which needed optimizations to make compatible and
extract the best efficiency. Additionally, we use the ONNX QDQ
<https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html>
format to enable scaling across a variety of NPUs we have in the Windows
ecosystem. We work out an optimal operator layout between the CPU and NPU
for maximum power-efficiency and speed.
—
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK7GOSCOMSDYP4ZG5MVPXCD2PY2YXAVCNFSM6AAAAABIR2ZJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRQGAZTIMZQGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: ofoacimr]*ofoacimr* left a comment (lmstudio-ai/lms#30)
<#30 (comment)>
Microsoft used ONNX QDQ format to make LLMs running on NPU, with AI
Toolkit VS Code extension.
https://blogs.windows.com/windowsdeveloper/2025/01/29/running-distilled-deepseek-r1-models-locally-on-copilot-pcs-powered-by-windows-copilot-runtime/
While the Qwen 1.5B release from DeepSeek does have an int4 variant, it
does not directly map to the NPU due to presence of dynamic input shapes
and behavior – all of which needed optimizations to make compatible and
extract the best efficiency. Additionally, we use the ONNX QDQ
<https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html>
format to enable scaling across a variety of NPUs we have in the Windows
ecosystem. We work out an optimal operator layout between the CPU and NPU
for maximum power-efficiency and speed.
—
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK7GOSCOMSDYP4ZG5MVPXCD2PY2YXAVCNFSM6AAAAABIR2ZJAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRQGAZTIMZQGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
thanks, just on windows though? anything for linux? |
May be be there is more info here; https://code.visualstudio.com/docs/setup/linux |
thanks, that supports the qualcomm NPU, not Intel, but appreciate it either way. Unfortunately Intel locked it's NPU into a tight little Openvino Toolkit world. It's actually fantastic for image recognition and object detection. And that is all, apparently lol |
Another MS SL7/arm64 + npu owner here. Bump! |
+1 Intel Ultra 9 185H. Intel Al Boost. |
+1 please 🙏 |
+1 |
Snapdragon X Elite and Core Ultra have NPUs built-in. However, it seems that the capabilities of the NPU can only be utilized with software that supports it. Does LM Studio have any plans to support these NPUs?
With the new Copilot+PC standard computers, the GPU is integrated into the CPU, and the VRAM shares memory with the main RAM. As a result, the graphics capability is a concern compared to laptops with NVIDIA GPUs. Therefore, even though these computers are touted as AI-ready, some use NPUs instead of GPUs. It would be very helpful if LM Studio could also support NPUs.
The text was updated successfully, but these errors were encountered: