Triton Kernels Playground

A playground for implementing and benchmarking Triton kernels against PyTorch equivalents.

Overview

This repository contains optimized Triton kernels and comprehensive benchmarks comparing them to PyTorch implementations. The focus is on performance analysis including throughput, latency, and memory usage.

Features

🚀 High-performance kernels: Optimized Triton implementations
📊 Comprehensive benchmarking: Throughput, latency, and memory analysis
📈 Visualization: Automatic generation of comparison plots
📝 Detailed reports: Markdown reports with speedup analysis
✅ Tested: Unit tests for correctness verification

Kernels

MatMul (Tile-based): Optimized matrix multiplication using tiling strategies
LayerNorm: Efficient layer normalization implementation
Fused Softmax: Fused softmax operation with numerical stability

Structure

triton-kernels-playground/
├── kernels/          # Triton kernel implementations
├── bench/            # Benchmarking infrastructure
│   ├── benchmark.py  # Core benchmarking utilities
│   ├── run_all.py    # Run all benchmarks
│   └── report_generator.py  # Generate markdown reports
├── utils/            # Utility functions
│   ├── logger.py     # JSONL logging
│   └── visualizer.py # Matplotlib visualization
├── tests/            # Unit tests
├── examples/         # Usage examples
└── reports/          # Generated benchmark reports

Installation

pip install -r requirements.txt

Quick Start

Run Examples

# Matrix multiplication example
python examples/matmul_example.py

# Layer normalization example
python examples/layernorm_example.py

# Softmax example
python examples/softmax_example.py

Run Benchmarks

# Run all benchmarks
python -m bench.run_all

# Generate report from latest benchmark
python -m bench.report_generator

Run Tests

pytest tests/

Benchmarks

Benchmarks measure:

Throughput: Operations per second
Latency: Time per operation (milliseconds)
Memory: Peak memory usage (MB)

Results are:

Logged to JSONL format (reports/benchmark_*.jsonl)
Visualized with matplotlib (reports/*.png)
Summarized in markdown reports (reports/benchmark_report.md)

Requirements

Python >= 3.8
PyTorch >= 2.0.0
Triton >= 3.0.0
CUDA-capable GPU (required for Triton kernels)
NumPy >= 1.24.0
Matplotlib >= 3.7.0

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Changelog

See CHANGELOG.md for a list of changes and version history.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Triton Kernels Playground

Overview

Features

Kernels

Structure

Installation

Quick Start

Run Examples

Run Benchmarks

Run Tests

Benchmarks

Requirements

License

Contributing

Changelog

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bench		bench
examples		examples
kernels		kernels
reports		reports
tests		tests
utils		utils
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Uh oh!

License

iotaaxel/triton-kernels-playground

Folders and files

Latest commit

History

Repository files navigation

Triton Kernels Playground

Overview

Features

Kernels

Structure

Installation

Quick Start

Run Examples

Run Benchmarks

Run Tests

Benchmarks

Requirements

License

Contributing

Changelog

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages