Skip to content

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-qlinearconv-per-channel-zero-points
Draft

Fix CPU QLinearConv: support per-channel weight zero points with distinct values#28456
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-qlinearconv-per-channel-zero-points

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 11, 2026

Description

The CPU QLinearConv kernel incorrectly rejected per-channel weight zero point tensors whose values were not all identical, even though the ONNX spec allows this for asymmetric per-channel quantization.

Kernel (qlinearconv.cc):

  • Removed the ORT_ENFORCE in ComputeOffset that required all per-channel W zero points to be equal
  • Extracted W zero point reading from ComputeOffset into Compute() directly, exposing the full per-channel array
  • Added W_zero_point_is_per_channel / W_zero_point_is_uniform flags
  • GEMM path: sets PerColumnZeroPoints = true and passes W_zero_point_data + group_id * group_output_channels when ZPs differ — MLAS already supported this
  • Depthwise path: requires uniform W zero points (since MlasConvDepthwise takes a scalar FilterZeroPoint); non-uniform per-channel ZPs automatically fall back to the group-GEMM path instead

Tests (qlinearconv_op_test.cc):

  • Added zero_points_ vector field to QuantizedTensor and SetWeightZeroPoints() method to QLinearConvOpTester
  • Updated ComputeExpectedOutput and Run() to emit a per-channel ZP tensor when set
  • Added three new test cases covering uint8 activations, int8 activations, and grouped convolution with per-channel W zero points

Motivation and Context

CPUExecutionProvider threw QLinearConv : zero point of per-channel filter must be same at runtime for any model using asymmetric per-channel weight quantization (distinct zero points per output channel), despite w_scale and w_zp both being valid 1-D [Cout] tensors per the ONNX spec. This made a common quantization pattern completely unusable on CPU.

w_zp = np.array([5, 90], dtype=np.uint8)  # different per-channel ZPs → was rejected

Copilot AI changed the title [WIP] Fix CPU QLinearConv for per-channel weight zero points Fix CPU QLinearConv: support per-channel weight zero points with distinct values May 11, 2026
Copilot AI requested a review from tianleiwu May 11, 2026 18:19
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment thread onnxruntime/core/providers/cpu/quantization/qlinearconv.cc Outdated
Copy link
Copy Markdown
Contributor

@tianleiwu tianleiwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel-side routing looks correct overall; the remaining gap is regression coverage around the new depthwise fallback.

Comment thread onnxruntime/test/providers/cpu/nn/qlinearconv_op_test.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU QLinearConv rejects per-channel weight zero points with different values

2 participants