< English | 中文 >
IPEX-LLM provides llama.cpp support for running GGUF models on Intel NPU. This guide demonstrates how to use llama.cpp NPU portable zip to directly run on Intel NPU (without the need of manual installations).
Important
- IPEX-LLM currently only supports Windows on Intel NPU.
- Only
meta-llama/Llama-3.2-3B-Instruct
,deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
anddeepseek-ai/DeepSeek-R1-Distill-Qwen-7B
are supported.
- Prerequisites
- Step 1: Download and Unzip
- Step 2: Setup
- Step 3: Run GGUF Model
- More details
- Troubleshooting
Check your NPU driver version, and update it if needed:
- Please use NPU driver version 32.0.100.3104.
- And you could refer to here for details about NPU driver update.
Download IPEX-LLM llama.cpp NPU portable zip for Windows users from the link.
Then, extract the zip file to a folder.
- Open "Command Prompt" (cmd), and enter the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER
- Runtime configuration based on your device:
-
For Intel Core™ Ultra Processors (Series 2) with processor number 2xxV (code name Lunar Lake):
-
For Intel Core™ Ultra 7 Processor 258V: No runtime configuration required.
-
For Intel Core™ Ultra 5 Processor 228V & 226V:
set IPEX_LLM_NPU_DISABLE_COMPILE_OPT=1
-
-
For Intel Core™ Ultra Processors (Series 2) with processor number 2xxK or 2xxH (code name Arrow Lake):
set IPEX_LLM_NPU_ARL=1
-
For Intel Core™ Ultra Processors (Series 1) with processor number 1xxH (code name Meteor Lake):
set IPEX_LLM_NPU_MTL=1
-
You could then use cli tool to run GGUF models on Intel NPU through running llama-cli-npu.exe
in the "Command Prompt" as following:
llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
Note
- The supported maximum number of input tokens is 960, and maximum sequence length for both input and output tokens is 1024 currently.
First, verify that your NPU driver version meets the requirement. Then, check the runtime configuration based on your device. And please attention the difference between Command Prompt and Windows PowerShell. Take Arrow Lake for example, you need to use set IPEX_LLM_NPU_ARL=1
in Command Prompt while $env:IPEX_LLM_NPU_ARL = "1"
in Windows PowerShell.