[Splash Attention] Remove Unnecessary head_dim_v Constraint and Update Scratch Array Shapes #27427 #27461

SwarnimShekhar · 2025-03-26T08:08:27Z

This PR addresses issue #27427 by removing the unnecessary constraint that enforced head_dim_v to be a multiple of NUM_LANES. The following changes are made:

Constraint Removal:
The check:

head_dim_v_repeats, rem = divmod(head_dim_v, NUM_LANES)
if rem != 0:
    raise NotImplementedError(
        f"{head_dim_v=} should be a multiple of {NUM_LANES}"
    )

has been removed. This constraint prevented small models (e.g. sLLMs) from using Splash Attention when head_dim_v is not a multiple of NUM_LANES.

Scratch Array Shape Update:
The scratch arrays m_scratch and l_scratch are now allocated with shape [bq, 1] instead of [bq, NUM_LANES], reducing redundant memory allocation. Broadcasting is now relied upon to handle the expansion as needed in downstream operations.

Alpha Repetition Removal:
All occurrences where pltpu.repeat(alpha, head_dim_v_repeats, axis=1) was used have been replaced with a direct assignment (alpha_o = alpha), which leverages broadcasting for correct behavior.

Testing:

All existing tests pass on both parallel and serial test runs.

…e scratch array shapes

…ax-ml#27460)

… into swarnim_shekhar

ds-hwang · 2025-03-27T17:38:11Z

Lovely. Thank you for quick action!

SwarnimShekhar · 2025-03-27T17:45:56Z

Glad to contribute!

SwarnimShekhar and others added 5 commits March 26, 2025 13:32

[Splash Attention] Remove unnecessary head_dim_v constraint and updat…

ed48233

…e scratch array shapes

[Splash Attention] Remove unnecessary head_dim_v constraint and updat…

0fb6abf

…e scratch array shapes

[Performance] Optimize CPU array creation in asarray for ~6× speedup (j…

00622d3

…ax-ml#27460)

Merge branch 'swarnim_shekhar' of https://github.com/SwarnimShekhar/jax…

511e400

… into swarnim_shekhar

Merge branch 'jax-ml:main' into swarnim_shekhar

1249071

chaosmaster142857 mentioned this pull request Apr 2, 2025

Splash Attention is Broken on TPU Pods and does not follow keras.config.disable_flash_attention() keras-team/keras#21116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Splash Attention] Remove Unnecessary head_dim_v Constraint and Update Scratch Array Shapes #27427 #27461

[Splash Attention] Remove Unnecessary head_dim_v Constraint and Update Scratch Array Shapes #27427 #27461

SwarnimShekhar commented Mar 26, 2025 •

edited

Loading

ds-hwang commented Mar 27, 2025

SwarnimShekhar commented Mar 27, 2025

[Splash Attention] Remove Unnecessary head_dim_v Constraint and Update Scratch Array Shapes #27427 #27461

Are you sure you want to change the base?

[Splash Attention] Remove Unnecessary head_dim_v Constraint and Update Scratch Array Shapes #27427 #27461

Conversation

SwarnimShekhar commented Mar 26, 2025 • edited Loading

ds-hwang commented Mar 27, 2025

SwarnimShekhar commented Mar 27, 2025

SwarnimShekhar commented Mar 26, 2025 •

edited

Loading