-
Notifications
You must be signed in to change notification settings - Fork 29.4k
[WIP] Quartet QAT support #38696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Quartet QAT support #38696
Conversation
cc @MekkCyber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @BlackSamorez ! Thanks a lot for this addition 🤗 ! Left a few comments !
@@ -0,0 +1,49 @@ | |||
# Copyright 2024 The HuggingFace Team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2024 The HuggingFace Team. All rights reserved. | |
# Copyright 2025 The HuggingFace Team. All rights reserved. |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"HIGGS through FLUTE (Flexible Lookup Table Engine for LUT-quantized LLMs) integration file" | |
"Quartet QAT integration file" |
if is_torch_available(): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this
@@ -0,0 +1,164 @@ | |||
# Copyright 2024 The HuggingFace Inc. team. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2024 The HuggingFace Inc. team. All rights reserved. | |
# Copyright 2025 The HuggingFace Inc. team. All rights reserved. |
Quantizer of the HIGGS method. Enables the loading of prequantized models and in-flight quantization of full-precision models. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be updated
def is_qutlass_available(): | ||
return _qutlass_available | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a distribution for qutlass
, is it not released yet ?
for name, module in tqdm(quartet_qat_modules.items(), desc="Pre-processing Quartet QAT modules", leave=False): | ||
pass | ||
# module.pre_forward() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s meant to happen here exactly ?
if isinstance(module, QuartetLinear) and tensor_name == "weight": | ||
# Only quantize weights of QuartetLinear modules that are not already quantized | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the bias quantized too ?
assert isinstance(module, QuartetLinear), f"Module {param_name} is not a QuartetLinear somehow..." | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for assert here, or we can just raise an error instead
module.pre_forward() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's happening here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Hadamard transform matrix initialization on the correct devices.
- Since it's a QAT method, we might or might not want to keep a full-precision weight copy. If we don't need the full precision weight copy, this function also deletes the
.weight
parameter after quantizing it. Here's the code.
This PR adds support for the Quartet QAT method.
The goal of this PR is to integrate inference and training support for the Quartet QAT method. That would allow to perform both forward and backward passes in MXFP4, allowing for very fast training on Blackwell GPUs.
Currently, we're working on the kernels here, here and here (some of the libs aren't public yet). We're planning to release the first version of the kernels this week and have optimized performance by end of June.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.