Skip to content

[QuantizationMetadata] update for missing qparams and module attributes#669

Open
brian-dellabetta wants to merge 4 commits into
mainfrom
bdellabe/quantization-metadata-updates
Open

[QuantizationMetadata] update for missing qparams and module attributes#669
brian-dellabetta wants to merge 4 commits into
mainfrom
bdellabe/quantization-metadata-updates

Conversation

@brian-dellabetta

@brian-dellabetta brian-dellabetta commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

Add some missing qparams and attributes (quantization_status, quantization_enabled) to QuantizationMetadata which were previously missing, so they weren't being pruned correctly.

Summary by CodeRabbit

  • Bug Fixes
    • Quantization metadata now includes weight dimensions and packed-state info so all relevant parameter details are tracked.
    • Clearing quantization now fully removes quantization status flags and related configuration, preventing leftover state and ensuring modules behave cleanly after disabling quantization.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db91645b-95ea-4a21-acd3-c2293251e235

📥 Commits

Reviewing files that changed from the base of the PR and between 77fb944 and 3ff457f.

📒 Files selected for processing (1)
  • src/compressed_tensors/quantization/quant_metadata.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/compressed_tensors/quantization/quant_metadata.py

📝 Walkthrough

Walkthrough

QuantizationMetadata in src/compressed_tensors/quantization/quant_metadata.py was updated: all_qparam_names() now includes explicit weight_shape and weight_packed names (and still prepends KVCacheScaleType.KEY.value and KVCacheScaleType.VALUE.value) in addition to generated {base}_{suffix} names; clear_quantization() now also deletes quantization_status and quantization_enabled.

Changes

Cohort / File(s) Summary
QuantizationMetadata updates
src/compressed_tensors/quantization/quant_metadata.py
all_qparam_names() extended to include explicit weight_shape and weight_packed alongside generated {base}_{suffix} entries (bases: input, weight, output; suffixes: global_scale, scale, zero_point, g_idx), and still prepends KVCacheScaleType.KEY.value/VALUE.value. clear_quantization() now removes quantization_status and quantization_enabled in addition to clearing qparams, quantization_scheme, and unwrapping forward.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through the bytes and found a new track,

weight_shape and weight_packed join the pack.
I cleared every flag, left no tiny trace,
status and enabled now vanish from place.
A joyful nibble—code tidy, swift pace.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updates to QuantizationMetadata to include missing qparams and module attributes (weight_shape, weight_packed, quantization_status, quantization_enabled).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bdellabe/quantization-metadata-updates

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/compressed_tensors/quantization/quant_metadata.py (1)

65-87: ⚠️ Potential issue | 🟠 Major

Docstring contradicts implementation.

Line 65 states "quantization_status and quantization_enabled are left unchanged," but lines 81-87 now delete these attributes. Please update the docstring to reflect the new behavior.

Proposed fix
         Remove all artifacts of quantization from module, non-recursively.
         Artifacts include any qparams, quantization_scheme, or wrapped
         forward method that might have been altered previously in lifecycle.

-        `quantization_status` and `quantization_enabled` are left unchanged.
-
         :param module: Module to clear
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/quantization/quant_metadata.py` around lines 65 - 87,
The docstring is out of sync: it claims "`quantization_status` and
`quantization_enabled` are left unchanged" but the implementation (within the
block using unwrap_offload_forward that unwraps module.forward, calls
cls.clear_all_qparams, and deletes module.quantization_scheme,
module.quantization_status, and module.quantization_enabled) actually removes
those attributes; update the docstring in this method to state that
quantization_status and quantization_enabled are deleted/cleared (matching the
code) and mention that module.forward is unwrapped and qparams and
quantization_scheme are cleared to keep docs consistent with the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 29-43: The returned metadata list currently contains a duplicate
"weight_shape" because it's added explicitly and also generated by the
comprehension (base_name="weight" + suffix="shape"); update the construction in
quant_metadata.py to avoid duplicates by removing the explicit "weight_shape"
from the first concatenated list (keep KVCacheScaleType entries and
"weight_packed"), or alternatively remove "shape" from the suffix tuple and add
explicit "input_shape" and "output_shape" if you need them; ensure references to
KVCacheScaleType, base_name, and suffix in the list comprehension remain
unchanged.

---

Outside diff comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 65-87: The docstring is out of sync: it claims
"`quantization_status` and `quantization_enabled` are left unchanged" but the
implementation (within the block using unwrap_offload_forward that unwraps
module.forward, calls cls.clear_all_qparams, and deletes
module.quantization_scheme, module.quantization_status, and
module.quantization_enabled) actually removes those attributes; update the
docstring in this method to state that quantization_status and
quantization_enabled are deleted/cleared (matching the code) and mention that
module.forward is unwrapped and qparams and quantization_scheme are cleared to
keep docs consistent with the behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ebfd6244-37ea-4261-bff5-25d764a67f26

📥 Commits

Reviewing files that changed from the base of the PR and between 8ed3cf7 and d79130c.

📒 Files selected for processing (1)
  • src/compressed_tensors/quantization/quant_metadata.py

Comment thread src/compressed_tensors/quantization/quant_metadata.py
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 80-86: clear_quantization() currently deletes
module.quantization_status and quantization_enabled which makes forward readers
like forward_quantize (in
src/compressed_tensors/quantization/lifecycle/forward.py) crash when they access
module.quantization_status directly; to fix, update the forward reader to
tolerate missing attrs by replacing direct accesses with guarded lookups (e.g.,
status = getattr(module, "quantization_status", None) and check status is not
None before comparing to QuantizationStatus.COMPRESSED, and similarly guard
quantization_enabled reads), then you can either keep attribute deletion or
change clear_quantization() to set those attributes to None to match the
docstring—ensure forward_quantize and any other readers use getattr-based guards
(use the suggested minimal guard around the base_name=="weight" /
QuantizationStatus.COMPRESSED check) and update the method docstring if you
change clear_quantization() behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15391856-32ea-496d-b013-dc7916bc2342

📥 Commits

Reviewing files that changed from the base of the PR and between d79130c and 77fb944.

📒 Files selected for processing (1)
  • src/compressed_tensors/quantization/quant_metadata.py

Comment thread src/compressed_tensors/quantization/quant_metadata.py

@kylesayrs kylesayrs left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel strongly that this is out of scope for the quantizationmetadata class, and should instead be handled by deleting unused parameters during decompressoon

@mergify

mergify Bot commented Jun 22, 2026

Copy link
Copy Markdown

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @brian-dellabetta.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants