Skip to content

Commit 23d06e8

Browse files
committed
cpu: aarch64: integrate KleidiAI trough oneDNN API
- Expose oneDNN KleidiAI kernels via BRGeMM API - Enable tiling trough A,B offsets parameter - pass "(-1, -1)" as offset for full matrix - MxN output - pass "vector of m_idx, n_idx" for one ukernel execution where one execution computes (m_step X n_step) - Update documentation to validate integration - Add functionality to benchdnn to execute F32 Kleidi kernels via BRGeMM API
1 parent 239dfba commit 23d06e8

29 files changed

+1869
-154
lines changed

CMakeLists.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#===============================================================================
22
# Copyright 2016-2024 Intel Corporation
3+
# Copyright 2025 Arm Ltd. and affiliates
34
#
45
# Licensed under the Apache License, Version 2.0 (the "License");
56
# you may not use this file except in compliance with the License.
@@ -98,6 +99,7 @@ include("cmake/OpenCL.cmake")
9899
include("cmake/platform.cmake")
99100
include("cmake/SDL.cmake")
100101
include("cmake/ACL.cmake")
102+
include("cmake/KleidiAI.cmake")
101103
include("cmake/blas.cmake")
102104
include("cmake/doc.cmake")
103105
include("cmake/version.cmake")

README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -169,13 +169,23 @@ Intel C++ Compiler.
169169
[Intel oneAPI DPC++/C++ Compiler]: https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler.html
170170

171171
On a CPU based on Arm AArch64 architecture, oneDNN CPU engine can be built with
172-
[Arm Compute Library (ACL)] integration. ACL is an open-source library for
173-
machine learning applications and provides AArch64 optimized implementations
174-
of core functions. This functionality currently requires that ACL is downloaded
175-
and built separately. See [Build from Source] section of the Developer Guide for
176-
details. oneDNN only supports Compute Library versions 24.11.1 or later.
172+
[Arm Compute Library (ACL)] or/and [KleidiAI (KAI)] integration.
173+
174+
ACL is an open-source library for machine learning applications and
175+
provides AArch64 optimized implementations of core functions. This functionality
176+
currently requires that ACL is downloaded and built separately.
177+
See [Build from Source] section of the Developer Guide for details. oneDNN only
178+
supports Compute Library versions 24.11.1 or later.
179+
180+
Arm® KleidiAI™ is an open-source library that provides optimized performance-critical
181+
routines, also known as micro-kernels, for artificial intelligence (AI) workloads
182+
tailored for Arm® CPUs. This functionality currently requires that KAI is
183+
downloaded and built separately.
184+
See [Build from Source] section of the Developer Guide for details. oneDNN only
185+
supports KleidiAI versions 1.4.0 or later.
177186

178187
[Arm Compute Library (ACL)]: https://github.com/arm-software/ComputeLibrary
188+
[KleidiAI (KAI)]: https://gitlab.arm.com/kleidi/kleidiai
179189

180190
### GPU Engine
181191

THIRD-PARTY-PROGRAMS

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,8 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
134134
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
135135

136136
--------------------------------------------------------------------------------
137-
4. CMake (cmake/FindOpenCL.cmake, cmake/FindBLAS.cmake, cmake/FindACL.cmake)
137+
4. CMake (cmake/FindOpenCL.cmake, cmake/FindBLAS.cmake, cmake/FindACL.cmake,
138+
cmake/FindKleidiAI.cmake)
138139
CMake - Cross Platform Makefile Generator
139140
Copyright 2000-2020 Kitware, Inc. and Contributors
140141
All rights reserved.

cmake/FindKleidiAI.cmake

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# ******************************************************************************
2+
# Copyright 2025 Arm Limited and affiliates.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
# ******************************************************************************
17+
18+
# ----------
19+
# FindKleidiAI
20+
# ----------
21+
#
22+
# Finds KleidiAI
23+
#
24+
# This module defines the following variables:
25+
#
26+
# KAI_INCLUDE_DIR - include directories for KleidiAI
27+
# KAI_LIBRARY - link against this library to use KleidiAI
28+
#
29+
# The module will also define two cache variables:
30+
#
31+
# KAI_INCLUDE_DIR - the KleidiAI include directory
32+
# KAI_LIBRARY - the path to the KleidiAI library
33+
#
34+
35+
find_path(KAI_INCLUDE_DIR
36+
NAMES kai/kai_common.h
37+
PATHS ENV KAI_ROOT_DIR
38+
)
39+
40+
find_library(KAI_LIBRARY
41+
NAMES kleidiai
42+
PATHS ENV KAI_ROOT_DIR
43+
PATH_SUFFIXES lib build
44+
)
45+
46+
include(FindPackageHandleStandardArgs)
47+
find_package_handle_standard_args(KleidiAI DEFAULT_MSG
48+
KAI_INCLUDE_DIR
49+
KAI_LIBRARY
50+
)
51+
52+
mark_as_advanced(
53+
KAI_LIBRARY
54+
KAI_INCLUDE_DIR
55+
)

cmake/KleidiAI.cmake

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# ******************************************************************************
2+
# Copyright 2025 Arm Limited and affiliates.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
# ******************************************************************************
17+
18+
if(kleidiai_cmake_included)
19+
return()
20+
endif()
21+
set(kleidiai_cmake_included true)
22+
include("cmake/options.cmake")
23+
24+
if(NOT DNNL_TARGET_ARCH STREQUAL "AARCH64")
25+
return()
26+
endif()
27+
28+
if(NOT DNNL_AARCH64_USE_KAI)
29+
return()
30+
endif()
31+
32+
find_package(KleidiAI REQUIRED)
33+
34+
include_directories(${KAI_INCLUDE_DIR})
35+
set_property(GLOBAL APPEND PROPERTY DNNL_SUBDIR_EXTRA_STATIC_LIBS ${KAI_LIBRARY})
36+
37+
add_definitions(-DDNNL_AARCH64_USE_KAI)

cmake/options.cmake

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#===============================================================================
22
# Copyright 2018-2025 Intel Corporation
3+
# Copyright 2025 Arm Ltd. and affiliates
34
#
45
# Licensed under the Apache License, Version 2.0 (the "License");
56
# you may not use this file except in compliance with the License.
@@ -234,7 +235,7 @@ set(ONEDNN_EXPERIMENTAL_GRAPH_COMPILER_CPU_JIT "builtin" CACHE STRING
234235
# Profiling capabilities
235236
# ======================
236237

237-
# TODO: restore default to ON after the issue with linking C files by
238+
# TODO: restore default to ON after the issue with linking C files by
238239
# Intel oneAPI DPC++ Compiler is fixed. Currently this compiler issues a warning
239240
# when linking object files built from C and C++ sources.
240241
option(DNNL_ENABLE_JIT_PROFILING
@@ -245,8 +246,8 @@ option(DNNL_ENABLE_JIT_PROFILING
245246
ON)
246247

247248
option(DNNL_ENABLE_ITT_TASKS
248-
"Enable ITT Tasks tagging feature and tag all primitive execution
249-
(on by default). VTune Profiler can group profiling results based
249+
"Enable ITT Tasks tagging feature and tag all primitive execution
250+
(on by default). VTune Profiler can group profiling results based
250251
on those ITT tasks and show corresponding timeline information."
251252
ON)
252253

@@ -425,3 +426,11 @@ option(DNNL_AARCH64_USE_ACL "Enables use of AArch64 optimised functions
425426
This is only supported on AArch64 builds and assumes there is a
426427
functioning Compute Library build available at the location specified by the
427428
environment variable ACL_ROOT_DIR." OFF)
429+
430+
# ==============================================
431+
# AArch64 optimizations with Arm KleidiAI
432+
# ==============================================
433+
434+
option(DNNL_AARCH64_USE_KAI "Enables use of AArch64
435+
optimised micro-kernels from Arm KleidiAI.
436+
This is only supported on AArch64 builds." OFF)

doc/Doxyfile.in

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#===============================================================================
22
# Copyright 2016-2022 Intel Corporation
3+
# Copyright 2025 Arm Ltd. and affiliates
34
#
45
# Licensed under the Apache License, Version 2.0 (the "License");
56
# you may not use this file except in compliance with the License.
@@ -533,7 +534,7 @@ INTERNAL_DOCS = NO
533534
# and Mac users are advised to set this option to NO.
534535
# The default value is: system dependent.
535536

536-
CASE_SENSE_NAMES = NO
537+
CASE_SENSE_NAMES = NO
537538

538539
# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with
539540
# their full class and namespace scopes in the documentation. If set to YES the
@@ -1818,7 +1819,7 @@ MAN_LINKS = NO
18181819
# captures the structure of the code including all documentation.
18191820
# The default value is: NO.
18201821

1821-
GENERATE_XML = YES
1822+
GENERATE_XML = YES
18221823

18231824
# The XML_OUTPUT tag is used to specify where the XML pages will be put. If a
18241825
# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of
@@ -1962,7 +1963,7 @@ INCLUDE_FILE_PATTERNS =
19621963
# recursively expanded use the := operator instead of the = operator.
19631964
# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.
19641965

1965-
PREDEFINED = DOXYGEN_SHOULD_SKIP_THIS DNNL_GPU_RUNTIME=DNNL_RUNTIME_OCL DNNL_WITH_SYCL DNNL_USE_SYCL_BUFFERS DNNL_EXPERIMENTAL_SPARSE DNNL_EXPERIMENTAL_UKERNEL DNNL_EXPERIMENTAL_LOGGING
1966+
PREDEFINED = DOXYGEN_SHOULD_SKIP_THIS DNNL_GPU_RUNTIME=DNNL_RUNTIME_OCL DNNL_WITH_SYCL DNNL_USE_SYCL_BUFFERS DNNL_EXPERIMENTAL_SPARSE DNNL_EXPERIMENTAL_UKERNEL DNNL_EXPERIMENTAL_LOGGING DNNL_AARCH64_USE_KAI
19661967

19671968
# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this
19681969
# tag can be used to specify a list of macro names that should be expanded. The

doc/build/build_options.md

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ oneDNN supports the following build-time options.
3030
| ONEDNN_VERBOSE | **ON**, OFF | Enables [verbose mode](@ref dev_guide_verbose) |
3131
| ONEDNN_DEV_MODE | ON, **OFF** | Enables internal tracing and `debuginfo` logging in verbose output (for oneDNN developers) |
3232
| ONEDNN_AARCH64_USE_ACL | ON, **OFF** | Enables integration with Arm Compute Library for AArch64 builds |
33+
| ONEDNN_AARCH64_USE_KAI | ON, **OFF** | Enables integration with KleidiAI Library for AArch64 builds |
3334
| ONEDNN_BLAS_VENDOR | **NONE**, ARMPL, ACCELERATE | Defines an external BLAS library to link to for GEMM-like operations |
3435
| ONEDNN_GPU_VENDOR | NONE, **INTEL**, NVIDIA, AMD | When DNNL_GPU_RUNTIME is not NONE defines GPU vendor for GPU engines otherwise its value is NONE|
3536
| ONEDNN_DPCPP_HOST_COMPILER | **DEFAULT**, *GNU or Clang C++ compiler executable* | Specifies host compiler executable for SYCL runtime |
@@ -264,10 +265,11 @@ By default, AArch64 builds will use the reference implementations throughout.
264265
The following options enable the use of AArch64 optimised implementations
265266
for a limited number of operations, provided by AArch64 libraries.
266267

267-
| AArch64 build configuration | CMake Option | Environment variables | Dependencies |
268-
|:-------------------------------------|:--------------------------|:-----------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
269-
| Arm Compute Library based primitives | ONEDNN_AARCH64_USE_ACL=ON | ACL_ROOT_DIR=*</path/to/ComputeLibrary>* | [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) |
270-
| Vendor BLAS library support | ONEDNN_BLAS_VENDOR=ARMPL | None | [Arm Performance Libraries](https://developer.arm.com/tools-and-software/server-and-hpc/downloads/arm-performance-libraries) |
268+
| AArch64 build configuration | CMake Option | Environment variables | Dependencies |
269+
|:-------------------------------------|:-------------------------------|:-----------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------|
270+
| Arm Compute Library based primitives | ONEDNN_AARCH64_USE_ACL=ON | ACL_ROOT_DIR=*</path/to/ComputeLibrary>* | [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) |
271+
| Arm KleidiAI based ukernels | ONEDNN_AARCH64_USE_KAI=ON | KAI_ROOT_DIR=*</path/to/KleidiAI>* | [Arm KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) |
272+
| Vendor BLAS library support | ONEDNN_BLAS_VENDOR=ARMPL | None | [Arm Performance Libraries](https://developer.arm.com/tools-and-software/server-and-hpc/downloads/arm-performance-libraries) |
271273

272274
#### Arm Compute Library
273275
Arm Compute Library is an open-source library for machine learning applications.
@@ -289,7 +291,31 @@ For a debug build of oneDNN it is advisable to specify a Compute Library build
289291
which has also been built with debug enabled.
290292

291293
@warning
292-
oneDNN only supports builds with Compute Library v23.11 or later.
294+
oneDNN only supports builds with Compute Library v24.11.1 or later.
295+
296+
#### KleidiAI
297+
KleidiAI that provides optimized performance-critical
298+
routines, also known as micro-kernels, for artificial intelligence (AI) workloads
299+
tailored for Arm® CPUs.
300+
The development repository and releases
301+
are available on [GitLab](https://gitlab.arm.com/kleidi/kleidiai).
302+
The `ONEDNN_AARCH64_USE_KAI` CMake option is used to enable Kleidi integration,
303+
in addition to `ONEDNN_EXPERIMENTAL_UKERNEL`:
304+
305+
~~~sh
306+
$ cmake -DONEDNN_EXPERIMENTAL_UKERNEL=ON -DONEDNN_AARCH64_USE_KAI=ON ..
307+
~~~
308+
309+
This assumes that the environment variable `KAI_ROOT_DIR` is
310+
set to the location of KleidiAI, which must be downloaded and built
311+
independently of oneDNN.
312+
313+
@warning
314+
For a debug build of oneDNN it is advisable to specify a KleidiAI build
315+
which has also been built with debug enabled.
316+
317+
@warning
318+
oneDNN only supports builds with KleidiAI v1.4.0 or later.
293319

294320
#### Vendor BLAS libraries
295321
oneDNN can use a standard BLAS library for GEMM operations.

examples/CMakeLists.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,9 @@ endif()
6363

6464
if(NOT DNNL_EXPERIMENTAL_UKERNEL)
6565
list(REMOVE_ITEM sources ${CMAKE_CURRENT_SOURCE_DIR}/ukernels/cpu_brgemm.cpp)
66+
list(REMOVE_ITEM sources ${CMAKE_CURRENT_SOURCE_DIR}/ukernels/cpu_kleidiai.cpp)
67+
elseif(NOT DNNL_AARCH64_USE_KAI)
68+
list(REMOVE_ITEM sources ${CMAKE_CURRENT_SOURCE_DIR}/ukernels/cpu_kleidiai.cpp)
6669
endif()
6770

6871
# Remove tests for CUDA which use unimplemented primitives

0 commit comments

Comments
 (0)