Skip to content

CPU QLinearConv rejects per-channel weight zero points with different values #28447

@ALinrunrun

Description

@ALinrunrun

Describe the issue

ONNX Runtime CPUExecutionProvider raises a runtime exception for QLinearConv when the weight zero point is a 1-D per-channel tensor with different values.

In the reproducer below, the weight tensor has Cout=2, with:

w_s = [0.02, 0.02]

w_zp = [5, 90]

Both w_s and w_zp are 1-D tensors of shape [Cout]. However, ORT raises:

QLinearConv : zero point of per-channel filter must be same

This appears related to asymmetric per-channel weight quantization support in the CPU QLinearConv kernel.

To reproduce

import numpy as np
import onnxruntime as ort
from onnx import TensorProto, helper, numpy_helper

Cout = 2

w_q = np.full((Cout, 1, 2, 2), 100, np.uint8)
w_s = np.array([0.02, 0.02], dtype=np.float32)
w_zp = np.array([5, 90], dtype=np.uint8)
x_q = np.full((1, 1, 3, 3), 130, dtype=np.uint8)

inits = [
numpy_helper.from_array(np.float32(0.05), "x_s"),
numpy_helper.from_array(np.uint8(128), "x_zp"),
numpy_helper.from_array(w_q, "W"),
numpy_helper.from_array(w_s, "w_s"),
numpy_helper.from_array(w_zp, "w_zp"),
numpy_helper.from_array(np.float32(0.1), "y_s"),
numpy_helper.from_array(np.uint8(128), "y_zp"),
]

node = helper.make_node(
"QLinearConv",
["x", "x_s", "x_zp", "W", "w_s", "w_zp", "y_s", "y_zp"],
["y"],
pads=[0, 0, 0, 0],
)

g = helper.make_graph(
[node],
"g",
[helper.make_tensor_value_info("x", TensorProto.UINT8, [1, 1, 3, 3])],
[helper.make_tensor_value_info("y", TensorProto.UINT8, [1, Cout, 2, 2])],
initializer=inits,
)

m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 13)])
m.ir_version = 8

sess = ort.InferenceSession(
m.SerializeToString(),
providers=["CPUExecutionProvider"],
)

out = sess.run(None, {"x": x_q})[0]
print(out)

Urgency

Expected output

Expected: the model runs with per-channel w_zp values, since w_s and w_zp are both 1-D tensors with shape [Cout].

Actual output

[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running QLinearConv node.

QLinearConv : zero point of per-channel filter must be same.
This happens by design if the quantization is symmetric.

Platform

Linux

OS Version

Linux-6.17.0-20-generic-x86_64-with-glibc2.39

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.25.1

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions