Skip to content

Align Arm quantizers and CortexMQuantizer #17701

@AdrianLundell

Description

@AdrianLundell

Overview

The idea is to change the implementation of the TOSAQuantizer and the subclasses EthosUQuantizer/VgfQuantizer to be based on the same modular system as used for the Cortex-M backend. This has the following benefits:

  1. Improved configurability - CortexMQuantizer allows custom annotation filters or even custom quantizers to allow perfect tailoring of quantization parameters.
  2. Improved visibility into the annotation process - The CortexMQuantizer comes with a
    QuantizerReporter which helps debugging the annotation process, and a single quantizer_support file which clearly defines supported operators.
  3. Code sharing - By aligning the two quantizers users get a more predictable behaviour and both backends will continuously benefit from each others improvements.

API

The function call API will stay consistent with the previous implementation, together with new functions exposing the new configuration possibilities. Any other known behaviour changes will be listed here:

  • Input/output nodes dtypes are now determined by set_global, rather than the closest annotated node.
  • Nodes with SharedQspecs will by default inherit dtype from its input rather than being set, see SharedQspecQuantizer in the detailed breakdown.

Sketch of new API:

qconfig1 = get_symmetric_quantization_config()  # Old way of creating quantization configs intact
qconfig2 = TOSAQuantizationConfig()             # TOSAQuantizationConfigs can also be created directly 

# Old API still intact
quantizer = TOSAQuantizer()
quantizer.set_global(qconfig1)
quantizer.set_module_name("sigmoid", qconfig2)

# New API function using a NodeFinder to filter out nodes, does the same thing but more flexible
node_finder = ModuleNameNodeFinder("sigmoid")   # Many more available, or create your own implementing the NodeFinder interface
quantizer.set_node_finder(node_finder, qconfig2)

# Third way of doing the same thing, even more flexible
pattern_matcher = PatternMatcher(TOSA_QUANTIZER_SUPPORT_DICT)  # TOSA_QUANTIZER_DICT is defined in the arm backend.
pattern_quantizer = PatternQuantizer(qconfig2, node_finder, pattern_matcher)
quantizer.add_quantizer(pattern_quantizer)

Detailed breakdown

The CortexMQuantizer is in turn made up of multiple smaller quantizers run sequentially which is what enables the biggest level of flexibility, custom quantizers. Realistically however most commonly used will be two types of predefined quantizers: PatternQuantizers and the SharedQspecQuantizer.

PatternQuantizer

The PatternQuantizer is used for annotating a selected set of nodes in the operator graph with a given QuantizationConfig. The nodes are selected via a NodeFinder, which can be either one of the ready-made finders already available, or custom made. The QuantizationConfig defines which QuantizationSpecs to be used for inputs, outputs, weights and biases respectively, i.e. dtypes, symmetric/asymmetric, observer types and so on. Backends may have special requirements on the qspecs for certain operators, for example equal qparams on input and output, which is why the configs are backend-specific. The goal is to expose choices the userr is interested in (int8 or int16 activations?) while hiding implementation details (transpose conv must set ch_axis=1).

The selected nodes are partitioned into patterns which are defined as supported for each backend by a support_dict. A pattern is a group of nodes which maps against one QuantizationConfig, most commonly single nodes or something like a convolution together with an activation function. The support_dict lists all such patterns which the backend handles; and maps it to a PatternChecker which checks if that particular configuration of the pattern and QuantizationConfig is supported. For example, a convolution might generally be supported so the pattern exists in the support_dict; however, it is only supported for channels_last input or int8 quantization.

SharedQspecQuantizer

The SharedQspecQuantizer is always applied after all other quantizers and aims to handle nodes which the user typically doesn't care about and which should just work. This refers to, for example, comparison ops, max/min-ops, and data movement ops such as copies, transposes and concats. These are simply annotated with a SharedQspec, with some extra logic to handle edge cases.

Quantizer ordering

The quantizers are generally applied "bottom-up", so the quantizer added last is applied first, and previos annotations are never overwritten. The exception is the quantizer configured by set_global which is always applied second-last as the default quantizer, and the SharedQspecQuantizer which is applied last as previosuly noted.

A final important detail is that the quantization is run twice to allow support for mixed int/float quantization. This relates to the graph transforms applied in transform_for_annotation which handles decompositions required before quantization. The first run of the quantizer marks which nodes should be decomposed by these transforms, while the second performs the actual annotation.

Quantizer reporter

The reporter prints a short per-quantizer summary at info-level loggging and a per-operator level report at debug-level logging. Here is an example of what the report looks like when one sigmoid has been selected to not be quantized, while all other nodes are int8.

----------------------------------------------------------------------------------------------------
                                     FINAL QUANTIZATION REPORT                                      
----------------------------------------------------------------------------------------------------
PatternQuantizer using ModuleTypeNodeFinder targeting module types: Sigmoid
Annotating with NO_QSPEC
Supported operators and patterns defined by TOSA_QUANTIZER_SUPPORT_DICT
   Accepted nodes: 1
   Rejected due to previous annotation: 0
   Rejected nodes: 0

   NODE NAME    INPUT QSPEC MAP    OUTPUT QSPEC MAP
   -----------  -----------------  ------------------
   sigmoid      add: NO_QSPEC      NO_QSPEC
----------------------------------------------------------------------------------------------------
PatternQuantizer using GlobalNodeFinder targeting all nodes
Annotating with INT8_TOSA_QCONFIG
Supported operators and patterns defined by  TOSA_QUANTIZER_SUPPORT_DICT
   Accepted nodes: 5
   Rejected due to previous annotation: 1
   Rejected nodes: 0

   NODE NAME    INPUT QSPEC MAP                 OUTPUT QSPEC MAP
   -----------  ------------------------------  ---------------------
   x                                            INT8_PER_TENSOR_QSPEC
   y                                            INT8_PER_TENSOR_QSPEC
   add          x: INT8_PER_TENSOR_QSPEC        INT8_PER_TENSOR_QSPEC
                y: INT8_PER_TENSOR_QSPEC
   mul          sigmoid: INT8_PER_TENSOR_QSPEC  INT8_PER_TENSOR_QSPEC
                x: INT8_PER_TENSOR_QSPEC
   output       mul: INT8_PER_TENSOR_QSPEC      NO_QSPEC
----------------------------------------------------------------------------------------------------
SharedQspecQuantizer using 
Annotating with SHARED_QCONFIG
Supported operators and patterns defined by executorch.backends.cortex_m.quantizer.quantizer.SharedQspecQuantizer.SHARED_QSPEC_OPS_DEFAULT
   No patterns accepted or rejected.

----------------------------------------------------------------------------------------------------
Non annotated nodes:
    None
----------------------------------------------------------------------------------------------------

Implementation plan

The Cortex-M quantizer is in processes of being updated to have the same level of support as the old TOSAQuantizer. When this is ready, the implementation ofthe new TOSAQuantizer will only be a matter of creating a TOSA support_dict, TOSA QuantizerConfigs, and some interface glue.

The new TOSAQuantizer will first be available as an experimental feature for some time, before it is set to the default and the old TOSAQuantizer starts being deprecated. Feedback during this period is much appreciated and can be posted in this thread.

cc @digantdesai @SS-JIA @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: armIssues related to arm backendpartner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

    Type

    Projects

    Status

    To triage

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions