Efficient Neural Network Deployment on Heterogenous TinyML Platforms
HTVM is a deep learning compiler for deploying neural networks on heterogeneous embedded compute platforms with multiple scratchpad-managed accelerators. HTVM generates self-contained C code that runs and dispatches neural network layers to either the platform's CPU, or one of it's accelerators.
To do this, HTVM mainly relies on:
- Apache TVM to generate CPU kernels, and run different layers sequentially.
- DORY to generate accelerator kernels with optimized scratchpad memory management.
Main requirements:
- TVM (contained in this repository) and tools to compile TVM.
- DORY version
8a0fe7bcadb207c6d80820a4bd2c2f2c0e823248
- Python 3.8
For DIANA, HTVM also requires:
- The adapted PULP-SDK for DIANA
- DORY Backend kernels for DIANA
- The PULP RISC-V GNU Toolchain
For your convenience, we advise to use our docker container with all dependencies installed, needed for building TVM.
We use podman
commands here, but note that you can use docker
as well if preferred.
Our github CI has an up-to-date image available that you can pull with:
podman pull ghcr.io/kuleuven-micas/htvm:main
Or you could build the container image locally with:
git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvm
podman build . -f diana/docker/Dockerfile.tvm -t htvm:main
Note
See the Dockerfile in case you want to attempt installation without a container.
If you haven't already cloned the repo, do:
git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvm
Now create and start a container:
podman run -itv=`pwd`:/tvm-fork:z htvm:main
Inside the container shell run:
mkdir build
cp diana/config.cmake build
cd build
cmake ..
make -j$(nproc)
cd ..
Test if it works (also run from inside the container):
cd diana/byoc
python3 driver.py -h
A number of ONNX example models, quantized by diana-quantlib, are provided in this repo through git LFS. For quantizing your own models, see diana-quantlib.
Download the model data with:
git lfs pull
Compile a model for DIANA with digital acceleration:
python3 driver.py --no-run --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnx
Output C-code and pulp binaries can be found at /tmp/digital_pulp_dory_fused_O3_None/pulp/
.
Compiling a model for running on the CPU of your local machine:
python3 driver.py --no-run --device x86 --target c --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnx
Output C-code and x86 binaries can be found at /tmp/digital_x86_c_fused_O3_None/x86
.
Run it locally with:
/tmp/digital_x86_c_fused_O3_None/x86/demo
In addition to the standard test suite, provided by TVM, HTVM contains its own additional unit tests and end-to-end test.
The unit tests can be run with:
cd /path/to/htvm
pytest -v tests/python/contrib/test_soma_dory
The end-to-end tests rely on example ONNX files that are tracked with git lfs. Run git lfs pull
in case you haven't done that already.
Now run:
cd diana/byoc
pytest -v test.py
HTVM currently supports deploying a number of tested neural networks on the Diana heterogeneous SoC.
The front-end supports ingesting quantized neural networks in ONNX format from Quantlib.
HTVM is Apache 2.0 Licensed.
This repository started off as a fork of the Apache TVM project on commit 2af3ab1e36e0e78bac8448a0357abee317fabb1f
but was rebased on upstream several times.