forked from jax-ml/jax
-
Notifications
You must be signed in to change notification settings - Fork 4
CI: 06/10/25 upstream sync #464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PiperOrigin-RevId: 764019664
PiperOrigin-RevId: 764148812
This helps with performance a bit (we only allocate and deallocate TMEM once in each SM), and opens up the opportunity for better overlapping of the epilogue. PiperOrigin-RevId: 764168230
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, and leads to improved build and iteration times. This required moving a couple `jax.numpy` imports into local functions. These could probably be addressed by moving the registrations elsewhere. PiperOrigin-RevId: 764170653
… TMEM Otherwise one block can begin the deallocation process before the other is done using it. PiperOrigin-RevId: 764173760
PiperOrigin-RevId: 764182705
http://github.com/openxla/xla/commit/acc83e32f93d83280d3672aa9194847a3d416b06. PiperOrigin-RevId: 764183483
…d layouts Any tile-aligned slicing is easy to handle. PiperOrigin-RevId: 764189366
This allows us to prime the GMEM->SMEM pipeline for the next tile while storing the SMEM->GMEM tile for the current one. However, this implies that we can no longer share the same SMEM region for the MMA pipeline and the epilogue, which pushes the SMEM pressure so high that we can't fetch too many steps into the future. Overall the performance is slightly worse than for the baseline kernel, but it recovers and improves upon it in the follow up. PiperOrigin-RevId: 764220403
This reworks the previous scheme by transferring all of TMEM to registers at once, and then doing RMEM->SMEM->GMEM in multiple phases, allowing us to use a smaller SMEM buffer. This, in turn, lets us bump max_concurrent_steps for the MMA pipeline which increases performance considerably. The only downside of this scheme is that even though it should be technically feasible to perform the epilogue with 255 registers per thread, ptxas generates a number of spills that might be lowering our performance. Either way, it's still better than the previous alternatives. PiperOrigin-RevId: 764249234
This replaces the old scheme that still included a bit of a bubble at the end of each tile with a new scheme that should be entirely bubble-free, for as long as the MMA loop is long enough to hide the store latency (i.e. for big enough K dimensions). This also removes the problems with spills we had in the previous version since the register footprint is relatively small now. PiperOrigin-RevId: 764256446
…mosaic:GPU can get access to the device ids in the mesh PiperOrigin-RevId: 764263324
XLA dumps one more HLO file by default, which leads to one more PGLE profile file. PiperOrigin-RevId: 764274080
All Triton-specific APIs are always used qualified, e.g. `plgpu.TritonCompilerParams`, so the prefix is redundant. PiperOrigin-RevId: 764276165
…l.run_state` PiperOrigin-RevId: 764276682
PiperOrigin-RevId: 764328992
PiperOrigin-RevId: 764330041
Resolve an issue where `jax.devices()` hangs due to unwanted TPU metadata query when using LibTPU with a device other than TPU (ex: CPU's). This feature can be useful in cross [AOT](https://docs.jax.dev/en/latest/aot.html).
PiperOrigin-RevId: 764354256
PiperOrigin-RevId: 764376257
This strips away the redundant terms in job names to keep them shorter and easy to read. Actions displays job names that reuse workflows in the following format: `caller workflow name / called workflow name`. The changes here are done in the called workflow names as changing the caller workflow names seem to make the summary page hard to parse (see https://github.com/jax-ml/jax/actions/runs/15217612585). Here's how the continuous workflow's summary page looks like with this change: https://github.com/jax-ml/jax/actions/runs/15286609214/job/42998511666 PiperOrigin-RevId: 764390866
PiperOrigin-RevId: 764408327
PiperOrigin-RevId: 764419062
…uild container to us-docker.pkg.dev/ml-oss-artifacts-published/ml-public-container/ml-build. These containers are the same (same build script), but they are just in a different repositories. PiperOrigin-RevId: 764435895
Updates LLVM usage to match [2b8bff6f66fd](llvm/llvm-project@2b8bff6f66fd) PiperOrigin-RevId: 764439621
PiperOrigin-RevId: 764449037
PiperOrigin-RevId: 764490044
PiperOrigin-RevId: 768932310
…nal_doc PiperOrigin-RevId: 769139444
…ve_scan_reverse_argument_order PiperOrigin-RevId: 769139664
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, and leads to improved build and iteration times. This required a few local imports and refactors. PiperOrigin-RevId: 769184594
PiperOrigin-RevId: 769203956
PiperOrigin-RevId: 769214784
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769236580
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769249808
'exectuable' should be 'executable'. PiperOrigin-RevId: 769256903
… to result in a ptxas miscompilation (between 12.8.0 and 12.9.1). PiperOrigin-RevId: 769257583
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769264747
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769280698
PiperOrigin-RevId: 769301226
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769320414
…fixes PiperOrigin-RevId: 769341663
Creating smaller build rules enforces better organized dependency graphs in the JAX project, helps pytype propagate annotations correctly, prevents use of internal APIs, and leads to improved build and iteration times. PiperOrigin-RevId: 769356578
PiperOrigin-RevId: 769376932
PiperOrigin-RevId: 769385812
…Assignment values. PiperOrigin-RevId: 769417940
It is probably a more useful default behavior not to implicitly inline everything. PiperOrigin-RevId: 769452443
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Daily sync with upstream