Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fp8 row/block-wise scaled GEMMs #2546

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

choutim
Copy link
Contributor

@choutim choutim commented Apr 29, 2024

Summary:
Add fp8 row/block-wise GEMM kernels with tests and benchmarks. Will register benchmark with TritonBench in separate pr.

H100 500W

fp8 scale + row gemm:   shape (8192, 8192, 8192)   tflops 931.80   ms 1.180
fp8 scale + block gemm: shape (8192, 8192, 8192)   tflops 594.84   ms 1.848
fp8 row gemm only:      shape (8192, 8192, 8192)   tflops 1125.51  ms 0.977
fp8 block gemm only:    shape (8192, 8192, 8192)   tflops 870.40   ms 1.263

bf16:                   shape (65536, 8192, 7168)  tflops 575.12   ms 13.383
fp8 scale + row gemm:   shape (65536, 8192, 7168)  tflops 1024.09  ms 7.516
fp8 scale + block gemm: shape (65536, 8192, 7168)  tflops 762.04   ms 10.100
fp8 row gemm only:      shape (65536, 8192, 7168)  tflops 1082.75  ms 7.108
fp8 block gemm only:    shape (65536, 8192, 7168)  tflops 828.34   ms 9.292

bf16:                   shape (65536, 3584, 8192)  tflops 546.31   ms 7.044
fp8 scale + row gemm:   shape (65536, 3584, 8192)  tflops 876.66   ms 4.390
fp8 scale + block gemm: shape (65536, 3584, 8192)  tflops 547.62   ms 7.027
fp8 row gemm only:      shape (65536, 3584, 8192)  tflops 1141.38  ms 3.372
fp8 block gemm only:    shape (65536, 3584, 8192)  tflops 828.31   ms 4.646

Differential Revision: D56337896

Summary:
Add fp8 row/block-wise GEMM kernels with tests and benchmarks. Will register benchmark with TritonBench in separate pr.

H100 500W
```bf16:                   shape (8192, 8192, 8192)   tflops 585.23   ms 1.879
fp8 scale + row gemm:   shape (8192, 8192, 8192)   tflops 931.80   ms 1.180
fp8 scale + block gemm: shape (8192, 8192, 8192)   tflops 594.84   ms 1.848
fp8 row gemm only:      shape (8192, 8192, 8192)   tflops 1125.51  ms 0.977
fp8 block gemm only:    shape (8192, 8192, 8192)   tflops 870.40   ms 1.263

bf16:                   shape (65536, 8192, 7168)  tflops 575.12   ms 13.383
fp8 scale + row gemm:   shape (65536, 8192, 7168)  tflops 1024.09  ms 7.516
fp8 scale + block gemm: shape (65536, 8192, 7168)  tflops 762.04   ms 10.100
fp8 row gemm only:      shape (65536, 8192, 7168)  tflops 1082.75  ms 7.108
fp8 block gemm only:    shape (65536, 8192, 7168)  tflops 828.34   ms 9.292

bf16:                   shape (65536, 3584, 8192)  tflops 546.31   ms 7.044
fp8 scale + row gemm:   shape (65536, 3584, 8192)  tflops 876.66   ms 4.390
fp8 scale + block gemm: shape (65536, 3584, 8192)  tflops 547.62   ms 7.027
fp8 row gemm only:      shape (65536, 3584, 8192)  tflops 1141.38  ms 3.372
fp8 block gemm only:    shape (65536, 3584, 8192)  tflops 828.31   ms 4.646
```

Differential Revision: D56337896
Copy link

netlify bot commented Apr 29, 2024

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 1a67eb7
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/6630073caea37b0008d79465
😎 Deploy Preview https://deploy-preview-2546--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants