Skip to content

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: llvm-head-staging
Choose a base branch
from

Conversation

ggengnv
Copy link

@ggengnv ggengnv commented Apr 9, 2025

This is 1 of the 2 patches needed to improve int4xbf16 GEMM perf.

This improves shmem swizzling when loading into LinearLayouts. This is needed because when using join/reshape, which is needed for efficient int4 upcasting, the propagated layout would be in LinearLayout rather than DotOp layout. Currently Triton falls back to an unswizzled shmem layout in this case, which is suboptimal.

This PR adds high-level heuristics to generate a swizzled layout for the above case.

cc @gflegar @loislo

abulavin and others added 2 commits April 3, 2025 16:37
Updating LLVM in order to pull in the following change:

- llvm/llvm-project#128566

For context, crash reproduction generation in MLIR will run the
`PassManager`'s passes in a child thread. The above PR fixes crashes for
when passes such as `add_di_scope` add `DistinctAttr` to the IR and
their storage is then accessed later once the child thread joins.
Pulling this in improves QoL for out-of-tree projects and makes the pass
manager more robust to the use of `DistinctAttr`.

This pin update has also introduced the deprecation of a
`llvm::TargetMachine::createTargetMachine` overload. I've updated the
callsites to use the non-deprecated overloads.

- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- Select one of the following.
  - [ ] I have added tests.
    - `/test` for `lit` tests
    - `/unittest` for C++ tests
    - `/python/test` for end-to-end tests
- [x] This PR does not need a test because `this PR only updates the
LLVM pin, so CI is sufficient`.

- Select one of the following.
  - [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
    and using the instructions it generates is not minimal.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants