Megakernels!

Installation

Clone this repo and run:

git submodule update --init --recursive
pip install uv
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
uv pip install -e .

Low-Latency Llama Demo

First, to compile the megakernel, run:

# from the repo root
export THUNDERKITTENS_ROOT=$(pwd)/ThunderKittens
export MEGAKERNELS_ROOT=$(pwd)
export PYTHON_VERSION=3.12 # adjust if yours is different
export GPU=H100 # options are {H100, B200}, else defaults to B200
cd demos/low-latency-llama
make

To start an interactive chat session with the model, run:

# from the repo root
python megakernels/scripts/llama_repl.py

To benchmark the megakernel, run:

# from the repo root
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megakernels!

Installation

Low-Latency Llama Demo

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Megakernels!

Installation

Low-Latency Llama Demo