Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

i-colbert · 2025-02-02T04:16:43Z

Reason for this PR

Currently, there are two GPFQ implementations: (1) GPFQ, which is faster for smaller models but runs into memory issues with larger models, and (2) GPFQv2 which avoid memory issues with larger models. See links below for these two formulations.

There is another formulation of GPFQ that is both more memory efficient and compute efficient, which this PR implements.

References for original GPFQ formulation:

Reference for memory-efficient GPFQ:

Accumulator-Aware Post-Training Quantization

Changes Made in this PR

Improved GPFQ implementation and removal of previous GPFQ implementations.

Testing Summary

Using the existing tests for GPxQ.

Risk Highlight

This PR includes code from another work (please detail).
This PR contains API-breaking changes.
This PR depends on work in another PR (please provide links/details).
This PR introduces new dependencies (please detail).
There are coverage gaps not covered by tests.
Documentation updates required in subsequent PR.

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

src/brevitas/graph/gpfq.py

Giuseppe5 · 2025-02-10T09:29:50Z

src/brevitas/graph/gpfq.py

+            weight_orig: Tensor = self.layer.weight_orig.data
+        else:
+            warnings.warn("Warning: GPFQ will perform better with `create_weight_orig=True`.")
+            weight_orig: Tensor = weight.clone()


Why are we still creating weight_orig and cloning?

Leaving the option for a user to disable create_weight_orig in case of memory issues. In such a scenario, we still need the track the original floating-point weights for the update rules, but the floating-point activations aren't the true floating-point activations. This should save memory still since the duplicate weights won't be stored.

src/brevitas/graph/gpfq.py

Giuseppe5 · 2025-02-12T16:11:07Z

src/brevitas/graph/gpfq.py

@@ -4,199 +4,40 @@
 from copy import deepcopy
 import math
 from typing import List, Optional
+import warnings


This is ok, but from now on I want to stop using warning and move everything to logging, so we have a bit more control

Giuseppe5 · 2025-02-12T16:16:23Z

src/brevitas_examples/common/axe.py

 from brevitas.graph.gptq import GPTQ
 from brevitas.graph.gpxq import SUPPORTED_CONV_OP
 from brevitas.graph.gpxq import SUPPORTED_TCONV_OP
+from brevitas.utils.quant_utils import _CachedIO


This is only for IntQuantTensor, but I guess it's fine

The AXE implementation is currently still experimental. Will work to extend support in future PR.

i-colbert added 7 commits February 2, 2025 04:04

Feat (gpfq): optimizing with lower diagonal matrix formulation

c673243

Fix: updating comments

bcb1141

Pre-commit fixes

2daf0d4

Cleanup

db41bf9

Fix (gpfq): typing issues

c00ef43

Fix

d73a5c1

Fix: t (gpfq)yping issues

56cf386

i-colbert added the next release PRs which should be merged for the next release label Feb 3, 2025

Giuseppe5 requested changes Feb 10, 2025

View reviewed changes

i-colbert added 3 commits February 10, 2025 21:44

Fix (gpfq): unified gpxq.get_quant_weight; removed typing hints

23a66b5

Fix (axe): updating AXE to use new GPFQ version

b6eaf43

Removing unused import

ae132dd

Giuseppe5 reviewed Feb 12, 2025

View reviewed changes

Giuseppe5 self-requested a review February 12, 2025 16:17

i-colbert mentioned this pull request Feb 12, 2025

[DRAFT] Feat (axe): improved implementation with extended support/testing #1181

Draft

14 tasks

Giuseppe5 merged commit 9b2939e into Xilinx:dev Feb 13, 2025
389 of 396 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

i-colbert commented Feb 2, 2025 •

edited

Loading

Giuseppe5 Feb 10, 2025

i-colbert Feb 10, 2025 •

edited

Loading

Giuseppe5 Feb 12, 2025

Giuseppe5 Feb 12, 2025

i-colbert Feb 12, 2025

Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

Conversation

i-colbert commented Feb 2, 2025 • edited Loading

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

Checklist

Giuseppe5 Feb 10, 2025

Choose a reason for hiding this comment

i-colbert Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Giuseppe5 Feb 12, 2025

Choose a reason for hiding this comment

Giuseppe5 Feb 12, 2025

Choose a reason for hiding this comment

i-colbert Feb 12, 2025

Choose a reason for hiding this comment

i-colbert commented Feb 2, 2025 •

edited

Loading

i-colbert Feb 10, 2025 •

edited

Loading