[JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v #1851

KshitijLakhani · 2025-06-04T17:34:12Z

Description

The MLA (DS_v3) support is available for hopper with CUDNN 9.10. However, support for this through the TE-JAX fused attention pathway is unavailable. This PR aims to provide this support.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Modify is_fused_attn_kernel_available() to accept different head_dims for qk and v
Modify FusedAttnHelper to accept different head_dims for qk and v and modify assert dims checks in parse_qkv_aval()
Modify FusedAttnFwdPrimitive and FusedAttnBwdPrimitive to accept different head_dims for qk and v
Modify Fused Attn related cpp and csrc extension API calls to accept different head_dims for qk and v
Modify DotProductAttention call() to extract head dims separately for qk and v
Modify the FusedAttn Tests to accommodate for API changes in FusedAttn API
Add test case for head_dim_qk != head_dim_v
Modify the baseline JAX appropriately to reshape the output vector based on v dims and not q dims

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

transformer_engine/jax/cpp_extensions/attention.py

KshitijLakhani · 2025-06-13T18:26:28Z

/te-ci jax

Modify is_fused_attn_kernel_available() to accept different head_dims for qk and v Modify FusedAttnHelper to accept different head_dims for qk and v and modify assert dims checks in parse_qkv_aval() Modify FusedAttnFwdPrimitive and FusedAttnBwdPrimitive to accept different head_dims for qk and v Modify Fused Attn related cpp and csrc extension API calls to accept different head_dims for qk and v Modify DotProductAttention call() to extract head dims separately for qk and v Modify the FusedAttn Tests to accommodate for API changes in FusedAttn API Add test case for head_dim_qk != head_dim_v (failing) Modify the baseline JAX appropriately to reshape the output vector based on v dims and not q dims Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

…head dim Add test cases for jax fused attn where head_dim_qk != head_dim_v for a combination of data types and attention type Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

…r qk and v in Fused Attn distributed tests Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani · 2025-06-13T19:19:52Z

/te-ci JAX

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani · 2025-06-13T22:14:32Z

Successful pipeline : 30053940

…A#1851) * Add support for Fused Attn MLA head_dim_qk != head_dim_v Modify is_fused_attn_kernel_available() to accept different head_dims for qk and v Modify FusedAttnHelper to accept different head_dims for qk and v and modify assert dims checks in parse_qkv_aval() Modify FusedAttnFwdPrimitive and FusedAttnBwdPrimitive to accept different head_dims for qk and v Modify Fused Attn related cpp and csrc extension API calls to accept different head_dims for qk and v Modify DotProductAttention call() to extract head dims separately for qk and v Modify the FusedAttn Tests to accommodate for API changes in FusedAttn API Add test case for head_dim_qk != head_dim_v (failing) Modify the baseline JAX appropriately to reshape the output vector based on v dims and not q dims Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix context dims in general DPA in test_fused_attn Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Fix dim for output tensor by replacing with v head dim rather than q head dim Add test cases for jax fused attn where head_dim_qk != head_dim_v for a combination of data types and attention type Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Modify the fused attn jax unit test case for head dim qk != head dim v Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Use new FusedAttnRunner function signature for separate hidden dim for qk and v in Fused Attn distributed tests Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Fix usage of is_fused_attn signature in distributed tests Signed-off-by: Kshitij Janardan Lakhani <[email protected]> * Remove unnecessary assert Signed-off-by: Kshitij Janardan Lakhani <[email protected]> --------- Signed-off-by: Kshitij Janardan Lakhani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

KshitijLakhani requested a review from cyanguwa June 4, 2025 20:37

KshitijLakhani self-assigned this Jun 4, 2025

cyanguwa reviewed Jun 4, 2025

View reviewed changes

transformer_engine/jax/cpp_extensions/attention.py Outdated Show resolved Hide resolved

KshitijLakhani changed the title ~~Add support for Fused Attn MLA head_dim_qk != head_dim_v~~ [JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v Jun 12, 2025

KshitijLakhani force-pushed the klakhani/feature/add-mla-jax-fused-support branch from 612ffdf to ed071aa Compare June 12, 2025 19:05

KshitijLakhani marked this pull request as ready for review June 12, 2025 19:05

KshitijLakhani added the 2.5.0 label Jun 13, 2025

KshitijLakhani and others added 7 commits June 13, 2025 12:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

9536e87

for more information, see https://pre-commit.ci

Fix context dims in general DPA in test_fused_attn

cab4c72

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Fix dim for output tensor by replacing with v head dim rather than q …

1a627ba

…head dim Add test cases for jax fused attn where head_dim_qk != head_dim_v for a combination of data types and attention type Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Modify the fused attn jax unit test case for head dim qk != head dim v

1beb75d

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Use new FusedAttnRunner function signature for separate hidden dim fo…

200635f

…r qk and v in Fused Attn distributed tests Code clean up Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

Fix usage of is_fused_attn signature in distributed tests

24788f1

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani force-pushed the klakhani/feature/add-mla-jax-fused-support branch from 780c0d7 to 24788f1 Compare June 13, 2025 19:17

cyanguwa previously approved these changes Jun 13, 2025

View reviewed changes

Remove unnecessary assert

a936b9a

Signed-off-by: Kshitij Janardan Lakhani <[email protected]>

KshitijLakhani dismissed cyanguwa’s stale review via a936b9a June 13, 2025 21:41

KshitijLakhani requested a review from cyanguwa June 13, 2025 21:46

cyanguwa approved these changes Jun 13, 2025

View reviewed changes

KshitijLakhani merged commit 1ddfa0c into NVIDIA:main Jun 13, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v #1851

[JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v #1851

Uh oh!

KshitijLakhani commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

Uh oh!

[JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v #1851

[JAX] Add support for Fused Attn MLA head_dim_qk != head_dim_v #1851

Uh oh!

Conversation

KshitijLakhani commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

Uh oh!

KshitijLakhani commented Jun 13, 2025

Uh oh!

Uh oh!

KshitijLakhani commented Jun 4, 2025 •

edited

Loading