Describe the issue
ONNX Runtime CPUExecutionProvider raises a runtime exception for QLinearConv when the weight zero point is a 1-D per-channel tensor with different values.
In the reproducer below, the weight tensor has Cout=2, with:
w_s = [0.02, 0.02]
w_zp = [5, 90]
Both w_s and w_zp are 1-D tensors of shape [Cout]. However, ORT raises:
QLinearConv : zero point of per-channel filter must be same
This appears related to asymmetric per-channel weight quantization support in the CPU QLinearConv kernel.
To reproduce
import numpy as np
import onnxruntime as ort
from onnx import TensorProto, helper, numpy_helper
Cout = 2
w_q = np.full((Cout, 1, 2, 2), 100, np.uint8)
w_s = np.array([0.02, 0.02], dtype=np.float32)
w_zp = np.array([5, 90], dtype=np.uint8)
x_q = np.full((1, 1, 3, 3), 130, dtype=np.uint8)
inits = [
numpy_helper.from_array(np.float32(0.05), "x_s"),
numpy_helper.from_array(np.uint8(128), "x_zp"),
numpy_helper.from_array(w_q, "W"),
numpy_helper.from_array(w_s, "w_s"),
numpy_helper.from_array(w_zp, "w_zp"),
numpy_helper.from_array(np.float32(0.1), "y_s"),
numpy_helper.from_array(np.uint8(128), "y_zp"),
]
node = helper.make_node(
"QLinearConv",
["x", "x_s", "x_zp", "W", "w_s", "w_zp", "y_s", "y_zp"],
["y"],
pads=[0, 0, 0, 0],
)
g = helper.make_graph(
[node],
"g",
[helper.make_tensor_value_info("x", TensorProto.UINT8, [1, 1, 3, 3])],
[helper.make_tensor_value_info("y", TensorProto.UINT8, [1, Cout, 2, 2])],
initializer=inits,
)
m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 13)])
m.ir_version = 8
sess = ort.InferenceSession(
m.SerializeToString(),
providers=["CPUExecutionProvider"],
)
out = sess.run(None, {"x": x_q})[0]
print(out)
Urgency
Expected output
Expected: the model runs with per-channel w_zp values, since w_s and w_zp are both 1-D tensors with shape [Cout].
Actual output
[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running QLinearConv node.
QLinearConv : zero point of per-channel filter must be same.
This happens by design if the quantization is symmetric.
Platform
Linux
OS Version
Linux-6.17.0-20-generic-x86_64-with-glibc2.39
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.25.1
ONNX Runtime API
Python
Architecture
X86
Execution Provider
Default CPU
Execution Provider Library Version
No response
Describe the issue
ONNX Runtime
CPUExecutionProviderraises a runtime exception forQLinearConvwhen the weight zero point is a 1-D per-channel tensor with different values.In the reproducer below, the weight tensor has
Cout=2, with:w_s = [0.02, 0.02]w_zp = [5, 90]Both
w_sandw_zpare 1-D tensors of shape[Cout]. However, ORT raises:QLinearConv : zero point of per-channel filter must be sameThis appears related to asymmetric per-channel weight quantization support in the CPU
QLinearConvkernel.To reproduce
import numpy as np
import onnxruntime as ort
from onnx import TensorProto, helper, numpy_helper
Cout = 2
w_q = np.full((Cout, 1, 2, 2), 100, np.uint8)
w_s = np.array([0.02, 0.02], dtype=np.float32)
w_zp = np.array([5, 90], dtype=np.uint8)
x_q = np.full((1, 1, 3, 3), 130, dtype=np.uint8)
inits = [
numpy_helper.from_array(np.float32(0.05), "x_s"),
numpy_helper.from_array(np.uint8(128), "x_zp"),
numpy_helper.from_array(w_q, "W"),
numpy_helper.from_array(w_s, "w_s"),
numpy_helper.from_array(w_zp, "w_zp"),
numpy_helper.from_array(np.float32(0.1), "y_s"),
numpy_helper.from_array(np.uint8(128), "y_zp"),
]
node = helper.make_node(
"QLinearConv",
["x", "x_s", "x_zp", "W", "w_s", "w_zp", "y_s", "y_zp"],
["y"],
pads=[0, 0, 0, 0],
)
g = helper.make_graph(
[node],
"g",
[helper.make_tensor_value_info("x", TensorProto.UINT8, [1, 1, 3, 3])],
[helper.make_tensor_value_info("y", TensorProto.UINT8, [1, Cout, 2, 2])],
initializer=inits,
)
m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 13)])
m.ir_version = 8
sess = ort.InferenceSession(
m.SerializeToString(),
providers=["CPUExecutionProvider"],
)
out = sess.run(None, {"x": x_q})[0]
print(out)
Urgency
Expected output
Expected: the model runs with per-channel
w_zpvalues, sincew_sandw_zpare both 1-D tensors with shape[Cout].Actual output
Platform
Linux
OS Version
Linux-6.17.0-20-generic-x86_64-with-glibc2.39
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.25.1
ONNX Runtime API
Python
Architecture
X86
Execution Provider
Default CPU
Execution Provider Library Version
No response