Run llama.cpp Portable Zip on Intel NPU with IPEX-LLM

< English | 中文 >

IPEX-LLM provides llama.cpp support for running GGUF models on Intel NPU. This guide demonstrates how to use llama.cpp NPU portable zip to directly run on Intel NPU (without the need of manual installations).

Important

IPEX-LLM currently only supports Windows on Intel NPU.
Only meta-llama/Llama-3.2-3B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and deepseek-ai/DeepSeek-R1-Distill-Qwen-7B are supported.

Prerequisites

Check your NPU driver version, and update it if needed:

Please use NPU driver version 32.0.100.3104.
And you could refer to here for details about NPU driver update.

Step 1: Download and Unzip

Download IPEX-LLM llama.cpp NPU portable zip for Windows users from the link.

Then, extract the zip file to a folder.

Step 2: Setup

Open "Command Prompt" (cmd), and enter the extracted folder through cd /d PATH\TO\EXTRACTED\FOLDER
Runtime configuration based on your device:
- For Intel Core™ Ultra Processors (Series 2) with processor number 2xxV (code name Lunar Lake):
  - For Intel Core™ Ultra 7 Processor 258V: No runtime configuration required.
  - For Intel Core™ Ultra 5 Processor 228V & 226V:
```
set IPEX_LLM_NPU_DISABLE_COMPILE_OPT=1
```
- For Intel Core™ Ultra Processors (Series 2) with processor number 2xxK or 2xxH (code name Arrow Lake):
```
set IPEX_LLM_NPU_ARL=1
```
- For Intel Core™ Ultra Processors (Series 1) with processor number 1xxH (code name Meteor Lake):
```
set IPEX_LLM_NPU_MTL=1
```

Step 3: Run GGUF Model

You could then use cli tool to run GGUF models on Intel NPU through running llama-cli-npu.exe in the "Command Prompt" as following:

llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"

Note

The supported maximum number of input tokens is 960, and maximum sequence length for both input and output tokens is 1024 currently.

Troubleshooting

`L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error

First, verify that your NPU driver version meets the requirement. Then, check the runtime configuration based on your device. And please attention the difference between Command Prompt and Windows PowerShell. Take Arrow Lake for example, you need to use set IPEX_LLM_NPU_ARL=1 in Command Prompt while $env:IPEX_LLM_NPU_ARL = "1" in Windows PowerShell.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_cpp_npu_portable_zip_quickstart.md

llama_cpp_npu_portable_zip_quickstart.md

Run llama.cpp Portable Zip on Intel NPU with IPEX-LLM

Table of Contents

Prerequisites

Step 1: Download and Unzip

Step 2: Setup

Step 3: Run GGUF Model

Troubleshooting

`L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error

Files

llama_cpp_npu_portable_zip_quickstart.md

Latest commit

History

llama_cpp_npu_portable_zip_quickstart.md

File metadata and controls

Run llama.cpp Portable Zip on Intel NPU with IPEX-LLM

Table of Contents

Prerequisites

Step 1: Download and Unzip

Step 2: Setup

Step 3: Run GGUF Model

Troubleshooting

L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004 error

`L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error