Rust + CUDA = Fast and simple inference library from scratch
Linux computer with CUDA 12~, cublas, rust installed.
You need at least sm_80
micro architecture. (This is hardcoded for now.)
WIP
Our first goal is to support bloat16 Llama 3.2 1B inference.