Background
Right now, quantization configs are serialized through the following lifecycle:
apply_quantization_config is used to attach quantization_scheme attributes to modules
- The model undergoes calibration and compression
- The quantization config is regenerated from the model using
QuantizationConifig.from_pretrained
- The new config is serialized by
ModelCompressor.update_config
This approach has some downsides (see phi3 example config)
- Any config group names set by the user are discarded
- The config groups which are generated do not necessarily match the config groups set by the user
- The ignore list becomes very large and ugly to read
- The logic for generating a config from a model is very difficult to maintain
The scope of this issue to investigate an approach whereby step (1) attaches the config as a quantization_config attribute on the model, which is then read by step (4) without having to go through step (3). This would mitigate all of the above downsides.
! Some things to keep in mind
apply_quantization_config may be applied multiple times. This may necessitate some logic to "merge" quantization configs. This has been written as a draft already, feel free to ping @kylesayrs if you would like to leverage this, or feel free to use your own.
Background
Right now, quantization configs are serialized through the following lifecycle:
apply_quantization_configis used to attachquantization_schemeattributes to modulesQuantizationConifig.from_pretrainedModelCompressor.update_configThis approach has some downsides (see phi3 example config)
The scope of this issue to investigate an approach whereby step (1) attaches the config as a
quantization_configattribute on the model, which is then read by step (4) without having to go through step (3). This would mitigate all of the above downsides.apply_quantization_configmay be applied multiple times. This may necessitate some logic to "merge" quantization configs. This has been written as a draft already, feel free to ping @kylesayrs if you would like to leverage this, or feel free to use your own.