Context
Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional transpose_a
and transpose_b
attributes: OV spec.
Currently, weight compression in NNCF does not support transpose_b
=False.
Here's the test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation, GPTQ and Lora Correction algorithms.
What needs to be done?
The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction, GPTQ) for models with matrix multiplications having not transposed weight.
test_compression_with_transpose shouldn't raise an error for transpose_b=False
Example Pull Requests
#3230
#3296
Resources
Contact points
@ljaljushkin
Ticket
No response