Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat (gpfq): optimizing with lower diagonal matrix formulation #1172

Merged
merged 10 commits into from
Feb 13, 2025

Conversation

i-colbert
Copy link
Collaborator

@i-colbert i-colbert commented Feb 2, 2025

Reason for this PR

Currently, there are two GPFQ implementations: (1) GPFQ, which is faster for smaller models but runs into memory issues with larger models, and (2) GPFQv2 which avoid memory issues with larger models. See links below for these two formulations.

There is another formulation of GPFQ that is both more memory efficient and compute efficient, which this PR implements.

References for original GPFQ formulation:

Reference for memory-efficient GPFQ:

Changes Made in this PR

Improved GPFQ implementation and removal of previous GPFQ implementations.

Testing Summary

Using the existing tests for GPxQ.

Risk Highlight

  • This PR includes code from another work (please detail).
  • This PR contains API-breaking changes.
  • This PR depends on work in another PR (please provide links/details).
  • This PR introduces new dependencies (please detail).
  • There are coverage gaps not covered by tests.
  • Documentation updates required in subsequent PR.

Checklist

  • Code comments added to any hard-to-understand areas, if applicable.
  • Changes generate no new warnings.
  • Updated any relevant tests, if applicable.
  • No conflicts with destination dev branch.
  • I reviewed my own code changes.
  • Initial CI/CD passing.
  • 1+ reviews given, and any review issues addressed and approved.
  • Post-review full CI/CD passing.

@i-colbert i-colbert added the next release PRs which should be merged for the next release label Feb 3, 2025
src/brevitas/graph/gpfq.py Outdated Show resolved Hide resolved
src/brevitas/graph/gpfq.py Outdated Show resolved Hide resolved
src/brevitas/graph/gpfq.py Outdated Show resolved Hide resolved
weight_orig: Tensor = self.layer.weight_orig.data
else:
warnings.warn("Warning: GPFQ will perform better with `create_weight_orig=True`.")
weight_orig: Tensor = weight.clone()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we still creating weight_orig and cloning?

Copy link
Collaborator Author

@i-colbert i-colbert Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the option for a user to disable create_weight_orig in case of memory issues. In such a scenario, we still need the track the original floating-point weights for the update rules, but the floating-point activations aren't the true floating-point activations. This should save memory still since the duplicate weights won't be stored.

src/brevitas/graph/gpfq.py Outdated Show resolved Hide resolved
@@ -4,199 +4,40 @@
from copy import deepcopy
import math
from typing import List, Optional
import warnings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok, but from now on I want to stop using warning and move everything to logging, so we have a bit more control

from brevitas.graph.gptq import GPTQ
from brevitas.graph.gpxq import SUPPORTED_CONV_OP
from brevitas.graph.gpxq import SUPPORTED_TCONV_OP
from brevitas.utils.quant_utils import _CachedIO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for IntQuantTensor, but I guess it's fine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AXE implementation is currently still experimental. Will work to extend support in future PR.

@Giuseppe5 Giuseppe5 self-requested a review February 12, 2025 16:17
@Giuseppe5 Giuseppe5 merged commit 9b2939e into Xilinx:dev Feb 13, 2025
389 of 396 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release PRs which should be merged for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants