🚀 TileRT v0.1.1 – Ultra-Low-Latency Token Generation

TileRT v0.1.1 delivers a significant boost in token generation performance, reducing latency by 35% compared to the previous release.

This improvement is achieved through optimizations to core operators and enhancements to the tile-level runtime engine. Key updates include faster GEMV kernels, expanded FP8/BF16 support across multiple kernels, and improved runtime scheduling and memory behavior.

✨ Highlights

Performance Boost: Token generation is now significantly faster, with latency reduced by around 35%. See our latest speed tests for exact figures.
Operator & Precision Optimizations: Faster GEMV, RMSNorm, and MMA-based operators with expanded FP8/BF16 support.
Runtime Enhancements: Improved tile-level scheduling, prefetching, memory alignment, and multi-device task handling.
Stability Fixes: Resolved issues affecting runtime stability and memory behavior.

🔧 What’s Changed

🚀 Performance & Operators

Optimized GEMV and RMSNorm operators for improved performance.
Expanded FP8/BF16 support across multiple kernels.
Improved expert selection performance.

⚙️ Runtime & Kernel Execution

Enhanced tile-level runtime engine for better scheduling, prefetching, and memory management.
Fixed shared memory alignment issues and inter-operator dependencies.

🔮 Looking Ahead

TileRT is under active development. The next release and upcoming work will focus on:

Further latency reductions in token generation.
Introduction of new features, including MTP support.
Opening the weight converter, enabling decoupled layouts and more flexible kernel optimizations.

With ongoing refactoring and continuous enhancements to operators and the runtime engine, we invite the community to follow our progress, test new features, and provide feedback to help shape the future development of TileRT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.1.1: Faster Token Generation ⚡

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🚀 TileRT v0.1.1 – Ultra-Low-Latency Token Generation

✨ Highlights

🔧 What’s Changed

🚀 Performance & Operators

⚙️ Runtime & Kernel Execution

🔮 Looking Ahead

Uh oh!