Skip to content

Latest commit

 

History

History
46 lines (30 loc) · 887 Bytes

File metadata and controls

46 lines (30 loc) · 887 Bytes

Megakernels!

Installation

Clone this repo and run:

git submodule update --init --recursive
pip install uv
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
uv pip install -e .

Low-Latency Llama Demo

First, to compile the megakernel, run:

# from the repo root
export THUNDERKITTENS_ROOT=$(pwd)/ThunderKittens
export MEGAKERNELS_ROOT=$(pwd)
export PYTHON_VERSION=3.12 # adjust if yours is different
export GPU=H100 # options are {H100, B200}, else defaults to B200
cd demos/low-latency-llama
make

To start an interactive chat session with the model, run:

# from the repo root
python megakernels/scripts/llama_repl.py

To benchmark the megakernel, run:

# from the repo root
python megakernels/scripts/generate.py mode=mk prompt="tell me a funny joke about cookies" ntok=100