Skip to content

[WIP] Quartet QAT support #38696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

BlackSamorez
Copy link
Contributor

This PR adds support for the Quartet QAT method.

The goal of this PR is to integrate inference and training support for the Quartet QAT method. That would allow to perform both forward and backward passes in MXFP4, allowing for very fast training on Blackwell GPUs.

Currently, we're working on the kernels here, here and here (some of the libs aren't public yet). We're planning to release the first version of the kernels this week and have optimized performance by end of June.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Rocketknight1
Copy link
Member

cc @MekkCyber

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @BlackSamorez ! Thanks a lot for this addition 🤗 ! Left a few comments !

@@ -0,0 +1,49 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2024 The HuggingFace Team. All rights reserved.
# Copyright 2025 The HuggingFace Team. All rights reserved.

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file"
"Quartet QAT integration file"

Comment on lines +22 to +24
if is_torch_available():
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this

@@ -0,0 +1,164 @@
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

Comment on lines +36 to +38
Quantizer of the HIGGS method. Enables the loading of prequantized models and in-flight quantization of full-precision models.
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be updated

Comment on lines +1163 to +1165
def is_qutlass_available():
return _qutlass_available

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a distribution for qutlass, is it not released yet ?

Comment on lines +125 to +128
for name, module in tqdm(quartet_qat_modules.items(), desc="Pre-processing Quartet QAT modules", leave=False):
pass
# module.pre_forward()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s meant to happen here exactly ?

Comment on lines +160 to +164
if isinstance(module, QuartetLinear) and tensor_name == "weight":
# Only quantize weights of QuartetLinear modules that are not already quantized
return True
else:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bias quantized too ?

Comment on lines +96 to +97
assert isinstance(module, QuartetLinear), f"Module {param_name} is not a QuartetLinear somehow..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for assert here, or we can just raise an error instead

Comment on lines +99 to +100
module.pre_forward()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Hadamard transform matrix initialization on the correct devices.
  2. Since it's a QAT method, we might or might not want to keep a full-precision weight copy. If we don't need the full precision weight copy, this function also deletes the .weight parameter after quantizing it. Here's the code.

@SunMarc SunMarc self-requested a review June 12, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants