[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

ggengnv · 2025-04-09T20:38:10Z

This is 1 of the 2 patches needed to improve int4xbf16 GEMM perf.

This improves shmem swizzling when loading into LinearLayouts. This is needed because when using join/reshape, which is needed for efficient int4 upcasting, the propagated layout would be in LinearLayout rather than DotOp layout. Currently Triton falls back to an unswizzled shmem layout in this case, which is suboptimal.

This PR adds high-level heuristics to generate a swizzled layout for the above case.

cc @gflegar @loislo

Updating LLVM in order to pull in the following change: - llvm/llvm-project#128566 For context, crash reproduction generation in MLIR will run the `PassManager`'s passes in a child thread. The above PR fixes crashes for when passes such as `add_di_scope` add `DistinctAttr` to the IR and their storage is then accessed later once the child thread joins. Pulling this in improves QoL for out-of-tree projects and makes the pass manager more robust to the use of `DistinctAttr`. This pin update has also introduced the deprecation of a `llvm::TargetMachine::createTargetMachine` overload. I've updated the callsites to use the non-deprecated overloads. - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - Select one of the following. - [ ] I have added tests. - `/test` for `lit` tests - `/unittest` for C++ tests - `/python/test` for end-to-end tests - [x] This PR does not need a test because `this PR only updates the LLVM pin, so CI is sufficient`. - Select one of the following. - [x] I have not added any `lit` tests. - [ ] The `lit` tests I have added follow these [best practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices), including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

abulavin and others added 2 commits April 3, 2025 16:37

Add shmem swizzling heuristic for LL

a6f553c

ggengnv mentioned this pull request Apr 9, 2025

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

Closed

vwbaker force-pushed the llvm-head-staging branch from c629b06 to 017162e Compare April 22, 2025 14:35

gflegar force-pushed the llvm-head-staging branch from a4f5b2f to fe66e41 Compare May 6, 2025 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

ggengnv commented Apr 9, 2025

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

Are you sure you want to change the base?

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts (Rebased) #24

Conversation

ggengnv commented Apr 9, 2025