Allow for trailing 'a' in sm_arch #126185

drisspg · 2024-05-14T16:52:22Z

Summary

I was getting

File "/home/drisspg/meta/pytorch/torch/cuda/__init__.py", line 312, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: invalid literal for int() with base 10: '90a'

cc @ptrblck @msaroufim

pytorch-bot · 2024-05-14T16:52:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126185

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 6b91199 with merge base ed76079 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge) (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2024-05-14T20:29:48Z

@pytorchbot merge

pytorchmergebot · 2024-05-14T20:33:56Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-14T20:39:19Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-docs / build-docs-python-false

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

drisspg · 2024-05-14T22:41:32Z

@pytorchbot merge

pytorchmergebot · 2024-05-14T22:43:20Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-14T23:04:41Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3.8-clang10 / test (dynamo, 1, 3, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

drisspg · 2024-05-15T00:09:17Z

@pytorchbot merge -f "unrelated failures:

pytorch-bot · 2024-05-15T00:09:19Z

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

drisspg · 2024-05-15T00:14:54Z

@pytorchbot merge -f "unrelated failures"

pytorchmergebot · 2024-05-15T00:16:30Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Summary I was getting ``` Shell File "/home/drisspg/meta/pytorch/torch/cuda/__init__.py", line 312, in _lazy_init raise DeferredCudaCallError(msg) from e torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: invalid literal for int() with base 10: '90a' ``` Pull Request resolved: pytorch#126185 Approved by: https://github.com/Skylion007

@ptrblck

# Summary This pull request introduces an fp8 row-scaling kernel as an optional implementation for `scaled_mm`. The kernel selection is based on the scaling tensors of the inputs. For inputs `x` and `y` of shape `[M, K]` and `[K, N]` respectively, the following conditions must be met: - `x`'s scale should be a 1-dimensional tensor of length `M`. - `y`'s scale should be a 1-dimensional tensor of length `N`. It's important to note that this kernel is not called "rowwise, columnwise" scaling because, although the scales for `y` are semantically along its columns, this implementation only supports the TN format. This means the scaling is along the faster-moving dimension, or the "row". The following two PRs were required to enable local builds: - [PR #126185](#126185) - [PR #125523](#125523) ### Todo We still do not build our Python wheels with this architecture. @ptrblck @malfet, should we replace `sm_90` with `sm_90a`? The NVRTC TMA shadowing feels wrong, but I a not sure the right way to spoof the symbol for this compilation unit: https://github.com/pytorch/pytorch/pull/125204/files#r1586986954 #### ifdef I tried to use : `#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 12000 && \ defined(__CUDA_ARCH__) && __CUDA_ARCH__ > 900` to gate the building of the kernel. I was having a hell of a time with this.. so I am not really sure the right way to do this Kernel Credit: @jwfromm Pull Request resolved: #125204 Approved by: https://github.com/lw

@ptrblck

# Summary This pull request introduces an fp8 row-scaling kernel as an optional implementation for `scaled_mm`. The kernel selection is based on the scaling tensors of the inputs. For inputs `x` and `y` of shape `[M, K]` and `[K, N]` respectively, the following conditions must be met: - `x`'s scale should be a 1-dimensional tensor of length `M`. - `y`'s scale should be a 1-dimensional tensor of length `N`. It's important to note that this kernel is not called "rowwise, columnwise" scaling because, although the scales for `y` are semantically along its columns, this implementation only supports the TN format. This means the scaling is along the faster-moving dimension, or the "row". The following two PRs were required to enable local builds: - [PR #126185](#126185) - [PR #125523](#125523) ### Todo We still do not build our Python wheels with this architecture. @ptrblck @malfet, should we replace `sm_90` with `sm_90a`? The NVRTC TMA shadowing feels wrong, but I a not sure the right way to spoof the symbol for this compilation unit: https://github.com/pytorch/pytorch/pull/125204/files#r1586986954 #### ifdef I tried to use : `#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 12000 && \ defined(__CUDA_ARCH__) && __CUDA_ARCH__ > 900` to gate the building of the kernel. I was having a hell of a time with this.. so I am not really sure the right way to do this Kernel Credit: @jwfromm Pull Request resolved: #125204 Approved by: https://github.com/lw, https://github.com/malfet

@ptrblck

# Summary This pull request introduces an fp8 row-scaling kernel as an optional implementation for `scaled_mm`. The kernel selection is based on the scaling tensors of the inputs. For inputs `x` and `y` of shape `[M, K]` and `[K, N]` respectively, the following conditions must be met: - `x`'s scale should be a 1-dimensional tensor of length `M`. - `y`'s scale should be a 1-dimensional tensor of length `N`. It's important to note that this kernel is not called "rowwise, columnwise" scaling because, although the scales for `y` are semantically along its columns, this implementation only supports the TN format. This means the scaling is along the faster-moving dimension, or the "row". The following two PRs were required to enable local builds: - [PR pytorch#126185](pytorch#126185) - [PR pytorch#125523](pytorch#125523) ### Todo We still do not build our Python wheels with this architecture. @ptrblck @malfet, should we replace `sm_90` with `sm_90a`? The NVRTC TMA shadowing feels wrong, but I a not sure the right way to spoof the symbol for this compilation unit: https://github.com/pytorch/pytorch/pull/125204/files#r1586986954 #### ifdef I tried to use : `#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 12000 && \ defined(__CUDA_ARCH__) && __CUDA_ARCH__ > 900` to gate the building of the kernel. I was having a hell of a time with this.. so I am not really sure the right way to do this Kernel Credit: @jwfromm Pull Request resolved: pytorch#125204 Approved by: https://github.com/lw

@ptrblck

# Summary First PR got reverted and needed a redo This pull request introduces an fp8 row-scaling kernel as an optional implementation for `scaled_mm`. The kernel selection is based on the scaling tensors of the inputs. For inputs `x` and `y` of shape `[M, K]` and `[K, N]` respectively, the following conditions must be met: - `x`'s scale should be a 1-dimensional tensor of length `M`. - `y`'s scale should be a 1-dimensional tensor of length `N`. It's important to note that this kernel is not called "rowwise, columnwise" scaling because, although the scales for `y` are semantically along its columns, this implementation only supports the TN format. This means the scaling is along the faster-moving dimension, or the "row". The following two PRs were required to enable local builds: - [PR #126185](#126185) - [PR #125523](#125523) ### Todo We still do not build our Python wheels with this architecture. @ptrblck @malfet, should we replace `sm_90` with `sm_90a`? The NVRTC TMA shadowing feels wrong, but I a not sure the right way to spoof the symbol for this compilation unit: https://github.com/pytorch/pytorch/pull/125204/files#r1586986954 #### ifdef I tried to use : `#if !defined(USE_ROCM) && defined(CUDA_VERSION) && CUDA_VERSION >= 12000 && \ defined(__CUDA_ARCH__) && __CUDA_ARCH__ > 900` to gate the building of the kernel. I was having a hell of a time with this.. so I am not really sure the right way to do this Kernel Credit: @jwfromm Pull Request resolved: #128989 Approved by: https://github.com/yangsiyu007, https://github.com/vkuzo

drisspg requested a review from eqy as a code owner May 14, 2024 16:52

drisspg added module: cuda Related to torch.cuda, and CUDA support in general topic: bug fixes topic category topic: not user facing topic category labels May 14, 2024

drisspg requested a review from malfet May 14, 2024 16:52

Skylion007 approved these changes May 14, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2024

pytorchmergebot added the merging label May 14, 2024

pytorchmergebot removed the merging label May 14, 2024

allow for trailing a in sm_arch

6b91199

drisspg force-pushed the allow-for-sm90a-lazy-load branch from 539c34e to 6b91199 Compare May 14, 2024 21:32

pytorchmergebot added the merging label May 14, 2024

pytorchmergebot removed the merging label May 14, 2024

pytorchmergebot added the merging label May 15, 2024

pytorchmergebot closed this in dccb5cf May 15, 2024

pytorchmergebot added Merged and removed merging labels May 15, 2024

drisspg mentioned this pull request May 16, 2024

FP8 rowwise scaling #125204

Closed

drisspg mentioned this pull request Jun 18, 2024

Enable fp8 rowwise scaling kernel on cuda, TAKE 2: #125204 #128989

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow for trailing 'a' in sm_arch #126185

Allow for trailing 'a' in sm_arch #126185

Uh oh!

drisspg commented May 14, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented May 14, 2024 •

edited

Loading

Uh oh!

drisspg commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Uh oh!

drisspg commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Uh oh!

drisspg commented May 15, 2024

Uh oh!

pytorch-bot bot commented May 15, 2024

Uh oh!

drisspg commented May 15, 2024

Uh oh!

pytorchmergebot commented May 15, 2024

Uh oh!

Uh oh!

Allow for trailing 'a' in sm_arch #126185

Allow for trailing 'a' in sm_arch #126185

Uh oh!

Conversation

drisspg commented May 14, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

pytorch-bot bot commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126185

❌ 1 New Failure

Uh oh!

drisspg commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Merge started

Uh oh!

pytorchmergebot commented May 14, 2024

Merge failed

Uh oh!

drisspg commented May 14, 2024

Uh oh!

pytorchmergebot commented May 14, 2024

Merge started

Uh oh!

pytorchmergebot commented May 14, 2024

Merge failed

Uh oh!

drisspg commented May 15, 2024

Uh oh!

pytorch-bot bot commented May 15, 2024

Uh oh!

drisspg commented May 15, 2024

Uh oh!

pytorchmergebot commented May 15, 2024

Merge started

Uh oh!

Uh oh!

drisspg commented May 14, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 14, 2024 •

edited

Loading