A vector benchmark suite for mobile processors.
Swan contains scalar and vectorized (Arm Neon implementation) data-parallel kernels, taken from 12 frequently-used libraries of 4 real-world mobile applications:
- Chromium project (web browser)
- Android (operating system)
- WebRTC Project (audio/video messaging)
- PDFium (PDF rendering engine)
The following table shows the libraries and their usage across the evaluated mobile applications:
Library | Domain | Chromium | Android | WebRTC | PDFium |
---|---|---|---|---|---|
libjpeg-turbo | Image Processing | ✔️ | ❌ | ❌ | ✔️ |
libpng | Image Processing | ✔️ | ❌ | ❌ | ✔️ |
libwebp | Image Processing | ✔️ | ❌ | ❌ | ✔️ |
Skia | Graphics | ✔️ | ✔️ | ❌ | ✔️ |
WebAudio | Audio Processing | ✔️ | ❌ | ✔️ | ❌ |
PFFFT | Audio Processing | ✔️ | ✔️ | ✔️ | ❌ |
zlib | Data Compression | ✔️ | ✔️ | ❌ | ✔️ |
boringssl | Cryptography | ✔️ | ✔️ | ✔️ | ❌ |
Arm Optimized Routines | String Utilities | ✔️ | ✔️ | ✔️ | ✔️ |
libopus | Audio Processing | ✔️ | ✔️ | ✔️ | ❌ |
libvpx | Video Processing | ✔️ | ✔️ | ✔️ | ❌ |
XNNPACK | Machine Learning | ✔️ | ✔️ | ❌ | ❌ |
- src/benchmark: benchmark infrastructure to configure and launch kernels, generate input data, and compare output results.
- src/libraries
/[LIB]/[KER]
: scalar and vector implementation forKER
kernel ofLIB
library. - src/fake_neon: Arm Neon intrinsic simulator.
- src/scripts: Performance and power measurement scripts for Android Devices.
Swan is compiled as an stand-alone tool without any dependencies. Download the latest version of benchmark from this repository:
git clone [email protected]:arkhadem/Swan.git
Kernels are equipped with utility functions to generate random inputs with the following charactrestics.
- 720x1280 (HD) images for Image Processing, Graphics, and Video Processing libraries.
- 1 second of a standard audio stream with a 44.1 kHz sample rate for Audio Processing libraries.
- 128 KB data for Data Compression, Cryptography, and String Utility libraries.
- 156 layers of Convolutional Neural Networks for the MachineLearning library.
One can change the input sizes in the src/benchmark/swan.hpp
header file.
Compile Swan for local execution with:
make local -j<num_threads>
When compiling locally on a machine with any architecture rather than armv8.2-a
, Swan switches to simulation mode (refer to Arm Fake Neon Library).
Swan is tested with android-ndk-r23c
for cross compilation for android devices with armv8.2-a
architectures.
In addition, swan requires fp16
, crypto
, and crc
extensions for XNNPACK, boringssl, and zlib libraries.
Download and extract Android NDK, and use the following instructions to setup the environmental variables.
export ANDROID_NDK_ROOT=/path/to/your/android-ndk-xxxx # replace xxx with NDK version (e.g., r23c)
export ANDROID_NDK_PATH=$ANDROID_NDK_ROOT/toolchains/llvm/prebuilt/linux-x86_64
Cross-compile Swan with:
make phone -j<num_threads>
Cross-compile Swan for RISC-V with:
make riscv -j<num_threads>
Makefile accepts the following commandline arguments to configure the build.
CACHE=[WARM|COLD]
: warm-up or cold-down caches before execution. Default isWARM
.AUTOVEC=[FALSE|TRUE]
: using compiler auto-vectorization. Default isFALSE
.SIMMOD=[FALSE|TRUE]
: simulation mode for local/riscv execution. Default isFALSE
forarmv8.2-a
machines andTRUE
for any other architectures. Uses Arm Fake Neon Library.SIMREG=[128|256|512|1024]
: maximum supported width of example kernels developed with fake wide vector register implementations (affective only in simulation mode).
# compiling locally in simulation mode with cold caches and compiler auto-vectorization.
# example kernels use up to 512-bit fake wide vector registers
make local CACHE=COLD AUTOVEC=TRUE SIMMOD=TRUE SIMREG=512
# cross-compiling for a phone with warm caches and no compiler auto-vectorization
make phone CACHE=WARM AUTOVEC=FALSE
Run the benchmark suite with:
./swan_[local|phone] -p [PLATFORM] -l [LIBRARY] -k [KERNEL]
PLATFORM
: scalar, neon, or compare (comparing scalar and neon output results)LIBRARY
: name of the target libraryKERNEL
: name of the target kernel
Use --help
for more options or --list
to get a list of supported libraries and kernels on the target machine architecture.
Having built Swan for phone, you can use the profiler script to dump profile results in a CSV file:
./scripts/profiler.py --measurement [power|performance] --output [profile.csv]
Profiler requires Android Debug Bridge (adb) installed locally.
Use --help
for more options.
Recommendation:
- Configure core masks (CPU affinities) in scripts/mask.py based on your Android Device CPU configuration and use
--core [core_name]
to pin Swan process to a specific core. - Use adb wireless debugging, specifically for power measurements.
Swan is equipped with a library of fake Arm Neon intrinsics that is used in simulation mode for multiple purposes:
- Fake Neon library extends 64 (D) and 128-bit (Q) vector registers of Arm Neon to 256 (Double-Q or DQ), 512 (Quad-Q or QQ), and 1024-bit (Octa-Q or OQ) registers. This library implements all intrinsics for all data widths, which could be used to study the scalability of mobile data-parallel applications with wider registers.
- Using this library, executing Swan is not limited to
armv8.2-a
architecture and can be simulated on any architecture such asX86-64
.
Simply include fake_neon.hpp
(instead of arm_neon.h
) and compile src/fake_neon/*.cpp
along with your vectorized kernel source code.
Before and after calling a vectorized kernel, inject Fake Neon initializer and finisher APIs:
fake_neon_initializer("kernel_name");
vectorized_kernel();
fake_neon_finisher();
Executing a kernel in simulation mode dumps vector instruction Data-Flow Graph (DFG) in a file named [kernel_name]_dfg.txt
.
DFG graph enables studying different charactrestics of vectorized data-parallel kernels such as Instruction-Level Parallelism (ILP).
The following kernels are extended with wide fake Neon implementations. Please refer to their Neon source code to see the examples.
Library | Kernel |
---|---|
libjpeg-turbo | RGB2YCbCr |
libwebp | TM Prediction |
Skia | Convolve Horizontally |
Webaudio | Audible |
zlib | Adler-32 |
libopus | Pitch Correlation |
libvpx | SAD |
XNNPACK | FP32 GEMM |
The current version of Fake Neon library does not support:
fp16
extension: FP16 implementations of GEMM and SpMM kernels of XNNPACK library.crypto
extension: all kernels of boringssl library.crc
extension: CRC32 kernel of zlib library.- assembly code: all kernels of Arm Optimized Routines library.
If you use Swan or find Swan useful, please cite this paper:
Alireza Khadem, Daichi Fujiki, Nishil Talati, Scott Mahlke, and Reetuparna Das. Vector-Processing for Mobile Devices: Benchmark and Analysis, In 2023 IEEE International Symposium on Workload Characterization (IISWC)
@inproceedings{swan,
title={Vector-Processing for Mobile Devices: Benchmark and Analysis},
author={Khadem, Alireza and Fujiki, Daichi and Talati, Nishil and Mahlke, Scott and Das, Reetuparna},
booktitle={2023 IEEE International Symposium on Workload Characterization (IISWC)},
year={2023}
}
Swan is under active development. We appreciate any feedback and suggestions from the community. Feel free to raise an issue or submit a pull request on Github. For assistance in using Swan, please contact: Alireza Khadem ([email protected])
Each kernel is individually licensed according to the library it is extracted from. Swan benchmarking infrastructure and Fake Neon library is available under a MIT license.
This work was supported in part by the NSF under CAREER-1652294 and NSF-1908601 awards and the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-sponsored by SRC and DARPA.