Skip to content

Add lowering for insert_slice-like scatter ops (KV-cache) #2771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Wheest
Copy link

@Wheest Wheest commented Apr 8, 2025

Coming from #1758, this PR adds a lowering for a narrow case of stablehlo.scatter, namely those that are equivalent to a tensor.insert_slice.

This is a common case, since it's how KV-cache updates are modelled when exporting from PyTorch.

The code is adapted from an implementation in the catalyst compiler by @erick-xanadu, expanded to cover the cases I saw coming out of PyTorch.

The conversion produces tensor.insert_slice ops, rather than linalg. This may or may not be acceptable, but I'm putting the PR up first to get thoughts.

I don't believe there's an exact equivalent to insert_slice in linalg, but I think it could be achieved with a linalg.generic. However, since the StableHLO conversion inserts a bunch of tensor ops anyway, I don't think lowering to tensor.insert_slice directly is against the spirit of what already exists.

All other cases of scatter will be left as-is, but given it's quite a complex op, this pattern will provide near-term utility while a more general purpose lowering is cooked up.

Note to reviewers, this is my first contribution to the project, so there may some workflow things I've missed.

@Wheest Wheest marked this pull request as ready for review April 8, 2025 10:03
@Wheest Wheest changed the title Add lowering for insert_slice-like scatter ops (KV-cache) #310 Add lowering for insert_slice-like scatter ops (KV-cache) Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant