Kvax: Fast Multi-Document Flash Attention with Context Parallelism #26813

southfreebird · 2025-02-27T17:51:13Z

southfreebird
Feb 27, 2025

Hi everyone!

We've just open-sourced kvax, a custom Flash Attention implementation based on JAX. It was designed for efficient training with long sequences and offers several cool features:

Well-optimized support for packing multiple data sequences into the same batch sequence.
Multi-GPU and multi-node context parallelism with token balancing.

We use this library in-house to train models on very long sequences, e.g., for our agentic research. In the document mask scenario, it outperforms the CuDNN implementation and FlexAttention.

The library is available under the Apache 2.0 license and can be easily integrated into an existing JAX codebase, esp. if you are using Flax.

GitHub: https://github.com/nebius/kvax
Blog post with benchmarks

Hope it will be useful to the community. Would appreciate any feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kvax: Fast Multi-Document Flash Attention with Context Parallelism #26813

{{title}}

Replies: 0 comments

Select a reply

Kvax: Fast Multi-Document Flash Attention with Context Parallelism #26813

southfreebird Feb 27, 2025

Replies: 0 comments

southfreebird
Feb 27, 2025