[Bug]: WeightsMapper used in QuantizationConfig does not handle regex/wildcard paths

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

vLLM model structure may be different from HuggingFace model structure so a remapping is used (WeightsMapper) when loading models. The remapping is defined in model executor logic, e.g. the [Qwen2.5 VL defines it here](https://github.com/vllm-project/vllm/blob/2d977a7a9ead3179fde9ed55d69393ef7b6cec47/vllm/model_executor/models/qwen2_5_vl.py#L1096-L1105)

In quantized models, in addition to the model weights, there may be extra configs such as to describe what modules are not quantized. For example, the compressed-tensors has an "ignore" in the config as a list of modules that shall be treated as not quantized; the Nvidia ModelOpt has an "exclude_modules" also in the config as a list of modules that shall be treated as not quantized. The issue is that these configs may not be simple module path/prefix names, they may be regex (in compressed-tensors) or wildcards (in Nvidia ModelOpt).

Take compressed-tensors as an example given it's also a vLLM owned project. An item in the ignore list may be a regex pattern such as "re:vision_tower.*" to exclude the whole vision encoder in a VLM. One random example quantized model that I find on HF:

https://huggingface.co/gaunernst/gemma-3-27b-it-qat-compressed-tensors

In it's config, it has the ignore list of:

"ignore": [
    "lm_head",
    "re:vision_tower.*"
],

When vLLM loads the quantization configs, it also applies weights mapping through [apply_vllm_mapper](https://github.com/vllm-project/vllm/blob/2d977a7a9ead3179fde9ed55d69393ef7b6cec47/vllm/model_executor/layers/quantization/base_config.py#L153), but the [WeightsMapper](https://github.com/vllm-project/vllm/blob/2d977a7a9ead3179fde9ed55d69393ef7b6cec47/vllm/model_executor/models/utils.py#L42) assumes that it just work with each single weight names. It will break if it hits a regex or a wildcard. For example, for the above ignore list, if a model has weight remap of:

prefix remap: vision_tower. -> vision.

Then the remap will miss the regular expression: "re:vision_tower.*".

I googled models that are quantized to compressed-tensors, it seems they just dodge this issue in one way or another, e.g. The above Gemma3 model does not have weight remapping in vLLM model executor; https://huggingface.co/cpatonn/Qwen3-VL-30B-A3B-Thinking-AWQ-4bit, the Qwen3-VL model has weight remapping but this quantized checkpoint just mark all the submodules of the vision encoder as ignore instead of using a regex.

There are ways to doge the issue, but, the semantic is broken. This needs to be fixed so it is semantically correct. And it is causing issues for models quantized by Nvidia ModelOpt.



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: WeightsMapper used in QuantizationConfig does not handle regex/wildcard paths #28072

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: WeightsMapper used in QuantizationConfig does not handle regex/wildcard paths #28072

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions