[QuantizationMetadata] update for missing qparams and module attributes by brian-dellabetta · Pull Request #669 · vllm-project/compressed-tensors

brian-dellabetta · 2026-04-08T17:21:26Z

Add some missing qparams and attributes (quantization_status, quantization_enabled) to QuantizationMetadata which were previously missing, so they weren't being pruned correctly.

Summary by CodeRabbit

Bug Fixes
- Quantization metadata now includes weight dimensions and packed-state info so all relevant parameter details are tracked.
- Clearing quantization now fully removes quantization status flags and related configuration, preventing leftover state and ensuring modules behave cleanly after disabling quantization.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

coderabbitai · 2026-04-08T17:23:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: db91645b-95ea-4a21-acd3-c2293251e235

📥 Commits

Reviewing files that changed from the base of the PR and between 77fb944 and 3ff457f.

📒 Files selected for processing (1)

src/compressed_tensors/quantization/quant_metadata.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/compressed_tensors/quantization/quant_metadata.py

📝 Walkthrough

Walkthrough

QuantizationMetadata in src/compressed_tensors/quantization/quant_metadata.py was updated: all_qparam_names() now includes explicit weight_shape and weight_packed names (and still prepends KVCacheScaleType.KEY.value and KVCacheScaleType.VALUE.value) in addition to generated {base}_{suffix} names; clear_quantization() now also deletes quantization_status and quantization_enabled.

Changes

Cohort / File(s)	Summary
QuantizationMetadata updates `src/compressed_tensors/quantization/quant_metadata.py`	`all_qparam_names()` extended to include explicit `weight_shape` and `weight_packed` alongside generated `{base}_{suffix}` entries (bases: `input`, `weight`, `output`; suffixes: `global_scale`, `scale`, `zero_point`, `g_idx`), and still prepends `KVCacheScaleType.KEY.value`/`VALUE.value`. `clear_quantization()` now removes `quantization_status` and `quantization_enabled` in addition to clearing qparams, `quantization_scheme`, and unwrapping `forward`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through the bytes and found a new track,

weight_shape and weight_packed join the pack.
I cleared every flag, left no tiny trace,
status and enabled now vanish from place.
A joyful nibble—code tidy, swift pace.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: updates to QuantizationMetadata to include missing qparams and module attributes (weight_shape, weight_packed, quantization_status, quantization_enabled).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bdellabe/quantization-metadata-updates

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/compressed_tensors/quantization/quant_metadata.py (1)

65-87: ⚠️ Potential issue | 🟠 Major

Docstring contradicts implementation.

Line 65 states "quantization_status and quantization_enabled are left unchanged," but lines 81-87 now delete these attributes. Please update the docstring to reflect the new behavior.

Proposed fix

         Remove all artifacts of quantization from module, non-recursively.
         Artifacts include any qparams, quantization_scheme, or wrapped
         forward method that might have been altered previously in lifecycle.

-        `quantization_status` and `quantization_enabled` are left unchanged.
-
         :param module: Module to clear

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/compressed_tensors/quantization/quant_metadata.py` around lines 65 - 87,
The docstring is out of sync: it claims "`quantization_status` and
`quantization_enabled` are left unchanged" but the implementation (within the
block using unwrap_offload_forward that unwraps module.forward, calls
cls.clear_all_qparams, and deletes module.quantization_scheme,
module.quantization_status, and module.quantization_enabled) actually removes
those attributes; update the docstring in this method to state that
quantization_status and quantization_enabled are deleted/cleared (matching the
code) and mention that module.forward is unwrapped and qparams and
quantization_scheme are cleared to keep docs consistent with the behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 29-43: The returned metadata list currently contains a duplicate
"weight_shape" because it's added explicitly and also generated by the
comprehension (base_name="weight" + suffix="shape"); update the construction in
quant_metadata.py to avoid duplicates by removing the explicit "weight_shape"
from the first concatenated list (keep KVCacheScaleType entries and
"weight_packed"), or alternatively remove "shape" from the suffix tuple and add
explicit "input_shape" and "output_shape" if you need them; ensure references to
KVCacheScaleType, base_name, and suffix in the list comprehension remain
unchanged.

---

Outside diff comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 65-87: The docstring is out of sync: it claims
"`quantization_status` and `quantization_enabled` are left unchanged" but the
implementation (within the block using unwrap_offload_forward that unwraps
module.forward, calls cls.clear_all_qparams, and deletes
module.quantization_scheme, module.quantization_status, and
module.quantization_enabled) actually removes those attributes; update the
docstring in this method to state that quantization_status and
quantization_enabled are deleted/cleared (matching the code) and mention that
module.forward is unwrapped and qparams and quantization_scheme are cleared to
keep docs consistent with the behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ebfd6244-37ea-4261-bff5-25d764a67f26

📥 Commits

Reviewing files that changed from the base of the PR and between 8ed3cf7 and d79130c.

📒 Files selected for processing (1)

src/compressed_tensors/quantization/quant_metadata.py

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/compressed_tensors/quantization/quant_metadata.py`:
- Around line 80-86: clear_quantization() currently deletes
module.quantization_status and quantization_enabled which makes forward readers
like forward_quantize (in
src/compressed_tensors/quantization/lifecycle/forward.py) crash when they access
module.quantization_status directly; to fix, update the forward reader to
tolerate missing attrs by replacing direct accesses with guarded lookups (e.g.,
status = getattr(module, "quantization_status", None) and check status is not
None before comparing to QuantizationStatus.COMPRESSED, and similarly guard
quantization_enabled reads), then you can either keep attribute deletion or
change clear_quantization() to set those attributes to None to match the
docstring—ensure forward_quantize and any other readers use getattr-based guards
(use the suggested minimal guard around the base_name=="weight" /
QuantizationStatus.COMPRESSED check) and update the method docstring if you
change clear_quantization() behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15391856-32ea-496d-b013-dc7916bc2342

📥 Commits

Reviewing files that changed from the base of the PR and between d79130c and 77fb944.

📒 Files selected for processing (1)

src/compressed_tensors/quantization/quant_metadata.py

kylesayrs

I feel strongly that this is out of scope for the quantizationmetadata class, and should instead be handled by deleting unused parameters during decompressoon

mergify · 2026-06-22T21:38:32Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @brian-dellabetta.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

update quant metadata fields

d79130c

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta requested review from HDCharles and kylesayrs April 8, 2026 17:21

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/compressed_tensors/quantization/quant_metadata.py

bugfix

77fb944

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

coderabbitai Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/compressed_tensors/quantization/quant_metadata.py

brian-dellabetta mentioned this pull request Apr 8, 2026

[save_pretrained] UX improvement for save_compressed=False vllm-project/llm-compressor#2588

Closed

1 task

HDCharles approved these changes Apr 8, 2026

View reviewed changes

brian-dellabetta added 2 commits April 8, 2026 17:26

remove docstring

0883111

Merge branch 'main' into bdellabe/quantization-metadata-updates

3ff457f

kylesayrs requested changes Apr 9, 2026

View reviewed changes

mergify Bot added the needs-rebase label Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QuantizationMetadata] update for missing qparams and module attributes#669

[QuantizationMetadata] update for missing qparams and module attributes#669
brian-dellabetta wants to merge 4 commits into
mainfrom
bdellabe/quantization-metadata-updates

brian-dellabetta commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

kylesayrs left a comment

Uh oh!

mergify Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brian-dellabetta commented Apr 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brian-dellabetta commented Apr 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 8, 2026 •

edited

Loading