Add recipe: progressive optimization of matrix multiplication in custom ops #13

BradLarson · 2025-02-28T22:59:14Z

Adds the next recipe in the custom ops series: a progressive optimization of matrix multiplications in MAX. The example here is once again drawn from the MAX repo's custom ops examples.

ehsanmok

Awesome! just a small thing

ehsanmok · 2025-02-28T23:04:05Z

custom-ops-matrix-multiplication/pyproject.toml

+    "https://conda.modular.com/max",
+    "https://repo.prefix.dev/modular-community",
+]
+platforms = ["linux-64", "osx-arm64", "linux-aarch64"]


osx-arm64 needs to be removed.

You'd pointed out the same thing on the last one, but why should we remove that platform support? These examples do build and run on macOS, but they fall back on the CPU path for naive implementations. I know they're aimed at Linux where you'll have GPU support, but they can at least be looked at locally on macOS.

It goes against the GPU story. Fine if we want to show on CPU.

ehsanmok · 2025-02-28T23:06:19Z

custom-ops-matrix-multiplication/benchmarks.mojo

+        bench_matmul_kernel["tiled_register"]()
+        bench_matmul_kernel["block_tiled"]()
+        bench_matmul_kernel["block_tiled_vectorized"]()
+        _ = gpu_ctx


Are these needed at the end? the origin was recently added to the layout tensor and not sure about the rest?

Unfortunately, I think they're still needed. We're only in the middle of transitioning these types, and when I tried removing the manual lifetime extensions on the latest nightly this still crashes on GPU execution with CUDA illegal access errors.

custom-ops-matrix-multiplication/metadata.yaml

custom-ops-matrix-multiplication/README.md

BradLarson · 2025-03-04T15:51:21Z

As discussed with the other custom ops recipes, I'll merge to place an initial state of these in the repository to make it easier for others to enhance them with improved wording and code.

Initial version of the matmul custom op recipe.

cf2ce93

ehsanmok requested changes Feb 28, 2025

View reviewed changes

ehsanmok reviewed Feb 28, 2025

View reviewed changes

custom-ops-matrix-multiplication/metadata.yaml Show resolved Hide resolved

rachfop reviewed Feb 28, 2025

View reviewed changes

custom-ops-matrix-multiplication/README.md Outdated Show resolved Hide resolved

custom-ops-matrix-multiplication/README.md Show resolved Hide resolved

Enhancing, per comments.

748a1a1

BradLarson merged commit a4fc268 into modular:main Mar 4, 2025
1 check passed

BradLarson deleted the custom-ops-matrix-multiplication branch March 6, 2025 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add recipe: progressive optimization of matrix multiplication in custom ops #13

Add recipe: progressive optimization of matrix multiplication in custom ops #13

BradLarson commented Feb 28, 2025

ehsanmok left a comment

ehsanmok Feb 28, 2025

BradLarson Feb 28, 2025

ehsanmok Mar 1, 2025

ehsanmok Feb 28, 2025

BradLarson Mar 1, 2025

BradLarson commented Mar 4, 2025

Add recipe: progressive optimization of matrix multiplication in custom ops #13

Add recipe: progressive optimization of matrix multiplication in custom ops #13

Conversation

BradLarson commented Feb 28, 2025

ehsanmok left a comment

Choose a reason for hiding this comment

ehsanmok Feb 28, 2025

Choose a reason for hiding this comment

BradLarson Feb 28, 2025

Choose a reason for hiding this comment

ehsanmok Mar 1, 2025

Choose a reason for hiding this comment

ehsanmok Feb 28, 2025

Choose a reason for hiding this comment

BradLarson Mar 1, 2025

Choose a reason for hiding this comment

BradLarson commented Mar 4, 2025