Description

RKLLM software stack can help users to quickly deploy AI models to Rockchip chips. The overall framework is as follows:

In order to use RKNPU, users need to first run the RKLLM-Toolkit tool on the computer, convert the trained model into an RKLLM format model, and then inference on the development board using the RKLLM C API.

RKLLM-Toolkit is a software development kit for users to perform model conversionand quantization on PC.
RKLLM Runtime provides C/C++ programming interfaces for Rockchip NPU platform to help users deploy RKLLM models and accelerate the implementation of LLM applications.
RKNPU kernel driver is responsible for interacting with NPU hardware. It has been open source and can be found in the Rockchip kernel code.

Support Platform

RK3588 Series
RK3576 Series
RK3562 Series

Support Models

Model Performance Benchmark

llm model	platform	dtype	seqlen	max_context	new_tokens	TTFT(ms)	Tokens/s	memory(G)
Qwen2-0.5B	RK3562	w4a16_g128	64	320	256	524	5.67	0.39
	RK3562	w4a8_g32	64	320	256	873	12.00	0.48
	RK3562	w8a8	64	320	256	477	11.50	0.61
	RK3576	w4a16	64	320	256	204	34.50	0.40
	RK3576	w4a16_g128	64	320	256	212	32.40	0.40
	RK3588	w8a8	64	320	256	79	41.50	0.62
	RK3588	w8a8_g128	64	320	256	183	25.07	0.75
TinyLLAMA-1.1B	RK3576	w4a16	64	320	256	345	21.10	0.77
	RK3576	w4a16_g128	64	320	256	410	18.50	0.80
	RK3588	w8a8	64	320	256	140	24.21	1.25
	RK3588	w8a8_g512	64	320	256	195	20.08	1.29
Qwen2-1.5B	RK3576	w4a16	64	320	256	512	14.40	1.75
	RK3576	w4a16_g128	64	320	256	550	12.75	1.76
	RK3588	w8a8	64	320	256	206	16.46	2.47
	RK3588	w8a8_g128	64	320	256	725	7.00	2.65
Phi-3-3.8B	RK3576	w4a16	64	320	256	975	6.60	2.16
	RK3576	w4a16_g128	64	320	256	1180	5.85	2.23
	RK3588	w8a8	64	320	256	516	7.44	3.88
	RK3588	w8a8_g512	64	320	256	610	6.13	3.95
ChatGLM3-6B	RK3576	w4a16	64	320	256	1168	4.62	3.86
	RK3576	w4a16_g128	64	320	256	1583	3.82	3.96
	RK3588	w8a8	64	320	256	800	4.95	6.69
	RK3588	w8a8_g128	64	320	256	2190	2.70	7.18
Gemma2-2B	RK3576	w4a16	64	320	256	628	8.00	3.63
	RK3576	w4a16_g128	64	320	256	776	7.40	3.63
	RK3588	w8a8	64	320	256	342	9.67	4.84
	RK3588	w8a8_g128	64	320	256	1055	5.49	5.14
InternLM2-1.8B	RK3576	w4a16	64	320	256	475	13.30	1.59
	RK3576	w4a16_g128	64	320	256	572	11.95	1.62
	RK3588	w8a8	64	320	256	206	15.66	2.38
	RK3588	w8a8_g512	64	320	256	298	12.66	2.45
MiniCPM3-4B	RK3576	w4a16	64	320	256	1397	4.80	2.70
	RK3576	w4a16_g128	64	320	256	1645	4.39	2.80
	RK3588	w8a8	64	320	256	702	6.15	4.65
	RK3588	w8a8_g128	64	320	256	1691	3.42	5.06
llama3-8B	RK3576	w4a16	64	320	256	1608	3.60	5.63
	RK3576	w4a16_g128	64	320	256	2010	3.00	5.76
	RK3588	w8a8	64	320	256	1128	3.79	9.21
	RK3588	w8a8_g512	64	320	256	1281	3.05	9.45

multimodal model	image input size	vision model dtype	vision infer time(s)	vision memory(MB)	llm model dtype	seqlen	max_context	new_tokens	TTFT(ms)	Tokens/s	llm memory(G)	platform
Qwen2-VL-2B	(1, 3, 392, 392)	fp16	3.55	1436.52	w4a16	256	384	128	2094.17	13.23	1.75	RK3576
		fp16	3.28	1436.52	w8a8	256	384	128	856.86	16.19	2.47	RK3588
MiniCPM-V-2_6	(1, 3, 448, 448)	fp16	2.40	1031.30	w4a16	128	256	128	2997.70	3.84	5.50	RK3576
		fp16	3.27	976.98	w8a8	128	256	128	1720.60	4.13	8.88	RK3588

This performance data were collected based on the maximum CPU and NPU frequencies of each platform.
The script for setting the frequencies is located in the scripts directory.
The vision model were tested based on all NPU core with rknn-toolkit2 version 2.2.0.

Performance Testing Methods

Run the frequency-setting script from the scripts directory on the target platform.
Execute export RKLLM_LOG_LEVEL=1 on the device to log model inference performance and memory usage.
Use the eval_perf_watch_cpu.sh script to measure CPU utilization.
Use the eval_perf_watch_npu.sh script to measure NPU utilization.

Download

You can download the latest package from RKLLM_SDK, fetch code: rkllm
You can download the converted rkllm model from rkllm_model_zoo, fetch code: rkllm

Examples

Multimodel deployment demo: Qwen2-VL-2B_Demo
API usage demo: DeepSeek-R1-Distill-Qwen-1.5B_Demo
API server demo: rkllm_server_demo
Multimodal_Interactive_Dialogue_Demo Multimodal_Interactive_Dialogue_Demo

Note

The supported Python versions are:
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12

Note: Before installing package in a Python 3.12 environment, please run the command:

export BUILD_CUDA_EXT=0

On some platforms, you may encounter an error indicating that libomp.so cannot be found. To resolve this, locate the library in the corresponding cross-compilation toolchain and place it in the board's lib directory, at the same level as librkllmrt.so.
Latest version: v1.2.0

RKNN Toolkit2

If you want to deploy additional AI model, we have introduced a SDK called RKNN-Toolkit2. For details, please refer to:

https://github.com/airockchip/rknn-toolkit2

CHANGELOG

v1.2.0

Supports custom model conversion.
Supports chat_template configuration.
Enables multi-turn dialogue interactions.
Implements automatic prompt cache reuse for improved inference efficiency.
Expands maximum context length to 16K.
Supports embedding flash storage to reduce memory usage.
Introduces the GRQ Int4 quantization algorithm.
Supports GPTQ-Int8 model conversion.
Compatible with the RK3562 platform.
Added support for visual multimodal models such as InternVL2, Janus, and Qwen2.5-VL.
Supports CPU core configuration.
Added support for Gemma3
Added support for Python 3.9/3.11/3.12

for older version, please refer CHANGELOG

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
doc		doc
examples		examples
res		res
rkllm-runtime		rkllm-runtime
rkllm-toolkit		rkllm-toolkit
rknpu-driver		rknpu-driver
scripts		scripts
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Support Platform

Support Models

Model Performance Benchmark

Performance Testing Methods

Download

Examples

Note

RKNN Toolkit2

CHANGELOG

v1.2.0

About

Releases 6

Packages

Languages

License

airockchip/rknn-llm

Folders and files

Latest commit

History

Repository files navigation

Description

Support Platform

Support Models

Model Performance Benchmark

Performance Testing Methods

Download

Examples

Note

RKNN Toolkit2

CHANGELOG

v1.2.0

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages