Skip to content

Releases: openvinotoolkit/nncf

v2.0.1

26 Oct 09:18
Compare
Choose a tag to compare

Target version updates:

  • Bump target framework versions to PyTorch 1.9.1 and TensorFlow 2.4.3
  • Increased target HuggingFace transformers version for the integration patch to 4.9.1

Bugfixes:

  • Fixed statistic collection for the algo mixing scenario
  • Increased pruning algorithm robustness in cases of a disconnected NNCF graph
  • Fixed the fatality of NNCF graph PNG rendering failures
  • Fixed README command lines
  • (PyTorch) Fixed a bug with quantizing shared weights multiple times
  • (PyTorch) Fixed knowledge distillation failures in CPU-only and DataParallel scenarios
  • (PyTorch) Fixed sparsity application for torch.nn.Embedding and EmbeddingBag modules
  • (PyTorch) Added GroupNorm + ReLU as a fusable pattern
  • (TensorFlow) Fixed gamma fusion handling for pruning TF BatchNorm
  • (PyTorch) Fixed pruning for models where operations have multiple convolution predecessors
  • (PyTorch) Fixed NNCFNetwork wrapper so that self in the calls to the wrapped model refer to the wrapper NNCFNetwork object and not to the wrapped model
  • (PyTorch) Fixed tracing of view operations to handle shape arguments with the torch.Tensor type
  • (PyTorch) Added matmul ops to be considered for fusing
  • (PyTorch, TensorFlow) Fixed tensorboard logging for accuracy-aware scenarios
  • (PyTorch, TensorFlow) Fixed FLOPS calculation for grouped convolutions
  • (PyTorch) Fixed knowledge distillation failures for tensors of unsupported shapes - will ignore output tensors with unsupported shapes now instead of crashing.

v2.0.0

20 Jul 08:49
Compare
Choose a tag to compare

New features:

  • Added TensorFlow 2.4.2 support - NNCF can now be used to apply the compression algorithms to models originally trained in TensorFlow.
    NNCF with TensorFlow backend supports the following features:

    • Compression algorithms:
      • Quantization (with HW-specific targeting aligned with PyTorch)
      • Sparsity:
        • Magnitude Sparsity
        • RB Sparsity
      • Filter pruning
    • Support for only Keras models consisting of standard Keras layers and created by:
      • Keras Sequential API
      • Keras Functional API
    • Automatic, configurable model graph transformation to obtain the compressed model.
    • Distributed training on multiple GPUs on one machine is supported using tf.distribute.MirroredStrategy.
    • Exporting compressed models to SavedModel or Frozen Graph format, ready to use with OpenVINO™ toolkit.
  • Added model compression samples for NNCF with TensorFlow backend:

    • Classification
      • Keras training loop.
      • Models form the tf.keras.applications module (ResNets, MobileNets, Inception and etc.) are supported.
      • TensorFlow Datasets (TFDS) and TFRecords (ImageNet2012, Cifar100, Cifar10) are supported.
      • Compression results are claimed for MobileNet V2, MobileNet V3 small, MobileNet V3 large, ResNet50, Inception V3.
    • Object Detection (Compression results are claimed for RetinaNet, YOLOv4)
      • Custom training loop.
      • TensorFlow Datasets (TFDS) and TFRecords for COCO2017 are supported.
      • Compression results for are claimed for RetinaNet, YOLOv4.
    • Instance Segmentation
      • Custom training loop
      • TFRecords for COCO2017 is supported.
      • Compression results are claimed for MaskRCNN
  • Accuracy-aware training available for filter pruning and sparsity in order to achieve best compression results within a given accuracy drop threshold in a fully automated fashion.

  • Framework-specific checkpoints produced with NNCF now have NNCF-specific compression state information included, so that the exact compressed model state can be restored/loaded without having to provide the same NNCF config file that was used during the creation of the NNCF-compressed checkpoint

  • Common interface for compression methods for both PyTorch and TensorFlow backends (https://github.com/openvinotoolkit/nncf/tree/develop/nncf/api).

  • (PyTorch) Added an option to specify an effective learning rate multiplier for the trainable parameters of the compression algorithms via NNCF config, for finer control over which should tune faster - the underlying FP32 model weights or the compression parameters.

  • (PyTorch) Unified scales for concat operations - the per-tensor quantizers that affect the concat operations will now have identical scales so that the resulting concatenated tensor can be represented without loss of accuracy w.r.t. the concatenated subcomponents.

  • (TensorFlow) Algo-mixing: Added configuration files and reference checkpoints for filter-pruned + qunatized models: ResNet50@ImageNet2012(40% of filters pruned + INT8), RetinaNet@COCO2017(40% of filters pruned + INT8).

  • (Experimental, PyTorch) Learned Global Ranking filter pruning mechanism for better pruning ratios with less accuracy drop for a broad range of models has been implemented.

  • (Experimental, PyTorch) Knowledge distillation supported, ready to be used with any compression algorithm to produce an additional loss source of the compressed model against the uncompressed version

Breaking changes:

  • CompressionLevel has been renamed to CompressionStage
  • "ignored_scopes" and "target_scopes" no longer allow prefix matching - use full-fledged regular expression approach via {re} if anything more than an exact match is desired.
  • (PyTorch) Removed version-agnostic name mapping for ReLU operations, i.e. the NNCF configs that referenced "RELU" (all caps) as an operation name will now have to reference an exact ReLU PyTorch function name such as "relu" or "relu_"
  • (PyTorch) Removed the example of code modifications (Git patches and base commit IDs are provided) for mmdetection repository.
  • Batchnorm adaptation "forgetting" step has been removed since it has been observed to introduce accuracy degradation; the "num_bn_forget_steps" parameter in the corresponding NNCF config section has been removed.
  • Framework-specific requirements no longer installed during pip install nncf or python setup.py install and are assumed to be present in the user's environment; the pip's "extras" syntax must be used to install the BKC requirements, e.g. by executing pip install nncf[tf], pip install nncf[torch] or pip install nncf[tf,torch]
  • "quantizable_subgraph_patterns" option removed from the NNCF config

Bugfixes:

  • (PyTorch) Fixed a hang with batchnorm adaptation being applied in DDP mode
  • (PyTorch) Fixed tracing of the operations that return NotImplemented

v1.7.1

06 May 11:02
Compare
Choose a tag to compare
  • Fixed a bug with where compressed models that were supposed to return named tuples actually returned regular tuples
  • Fixed an issue with batch norm adaptation-enabled compression runs hanging in the DDP scenario

v1.7.0

19 Apr 12:29
Compare
Choose a tag to compare

New features:

  • Adjust Padding feature to support accurate execution of U4 on VPU - when setting "target_device" to "VPU", the training-time padding values for quantized convolutions will be adjusted to better reflect VPU inference process.
  • Weighted layers that are "frozen" (i.e. have requires_grad set to False at compressed model creation time) are no longer considered for compression, to better handle transfer learning cases.
  • Quantization algorithm now sets up quantizers without giving an option for requantization, which guarantees best performance, although at some cost to quantizer configuration flexibility.
  • Pruning models with FCOS detection heads and instance normalization operations now supported
  • Added a mean percentile initializer for the quantization algorithm
  • Now possible to additionally quantize model outputs (separate control for each output quantization is supported)
  • Models quantized for CPU now use effective 7-bit quantization for weights - the ONNX-exported model is still configured to use 8 bits for quantization, but only the middle 128 quanta of the total possible 256 are actually used, which allows for better OpenVINO inference accuracy alignment with PyTorch on non-VNNI CPUs
  • Bumped target PyTorch version to 1.8.1 and relaxed package requirements constraints to allow installation into environments with PyTorch >=1.5.0

Notable bugfixes:

  • Fixed bias pruning in depthwise convolution
  • Made per-tensor quantization available for all operations that support per-channel quantization
  • Fixed progressive training performance degradation when an output tensor of an NNCF-compressed model is reused as its input.
  • pip install . path of installing NNCF from a checked-out repository is now supported.
  • Nested with no_nncf_trace() blocks now function as expected.
  • NNCF compression API now formally abstract to guard against virtual function calls
  • Now possible to load AutoQ and HAWQ-produced checkpoints to evaluate them or export to ONNX

Removed features:

  • Pattern-based quantizer setup mode for quantization algorithm - due to its logic, it did not guarantee that all required operation inputs are ultimately quantized.

v1.6.0

29 Jan 14:25
Compare
Choose a tag to compare
  • Added AutoQ - an AutoML-based mixed-precision initialization mode for quantization, which utilizes the power of reinforcement learning to select the best quantizer configuration for any model in terms of quality metric for a given HW architecture type.
  • NNCF now supports inserting compression operations as pre-hooks to PyTorch operations, instead of abusing the post-hooking; the flexibility of quantization setups has been improved as a result of this change.
  • Improved the pruning algorithm to group together dependent filters from different layers in the network and prune these together
  • Extended the ONNX compressed model exporting interface with an option to explicitly name input and output tensors
  • Changed the compression scheduler so that the correspondingepoch_step and step methods should now be called in the beginning of the epoch and before the optimizer step (previously these were called in the end of the epoch and after the optimizer step respectively)
  • Data-dependent compression algorithm initialization is now specified in terms of dataset samples instead of training batches, e.g. "num_init_samples" should be used in place of "num_init_steps" in NNCF config files.
  • Custom user modules to be registered for compression can now be specified to be ignored for certain compression algorithms
  • Batch norm adaptation now being applied by default for all compression algorithms
  • Bumped target PyTorch version to 1.7.0
  • Custom OpenVINO operations such as "FakeQuantize" that appear in NNCF-exported ONNX models now have their ONNX domain set to org.openvinotoolkit
  • The quantization algorithm will now quantize nn.Embedding and nn.EmbeddingBag weights when targeting CPU
  • Added an option to optimize logarithms of quantizer scales instead of scales themselves directly, a technique which improves convergence in certain cases
  • Added reference checkpoints for filter-pruned models: UNet@Mapillary (25% of filters pruned), SSD300@VOC (40% of filters pruned)

Release v1.5.0 of NNCF to master (#254)

06 Nov 14:39
1a8bbb7
Compare
Choose a tag to compare
* Allow sharing activation quantizers in different graph points (#67)

* Update version and docs on develop (#77)

* Update 3rd party integration patches (#79)

* Doc updates (#84)

* Add info on export to Usage.md

* Fix third party headers

* Fix import in transformers patch (#85)

* Fix percentile per-channel init (#86)

Fixes: #83

* Omit nodes called during debugging from entering NNCF graph (#87)

* Enable custom range initializers for overriden scopes in schema (#89)

* Enable custom quantization configs and initializers for overriden scopes in schema

* code style

* remove range config duplication

* obsolete import

* Fix model saving in transformers patch (#91)

* Patch TracedTensor's __repr__ method instead of torch.Tensor's (#92)

* Fix mmdetection patch (#93)

* Update mmdetection patch to v2.3.0 (#95)

* Allow registering user modules as NNCF modules for weight quantization (#99)

* Assign latest tensor shape during ForwardTraceOnly() (#96)

* Enable GPT2 ops (#98)

* Fix HW config scenario with ops missing in HW config definition (#94)

* Fix input quantization in case of embeddings (#97)

* Added sanity tests for third party integration (#45)

* Expose quantizer linking through config (#100)

* Add citing section to frontpage README (#103)

* Fix bad rebase in asymmetric quantization ONNX export (#104)

* Use default quantizer configuration for op weights not specified in HW config (#105)

* Update transformers to v3.0.2 (#107)

* Fix symmetric quantizer per-channel init for max values close to 0 (#109)

* Add unified scales in HW config operation (via quantizer linking) (#108)

* Add quantization metric (#33)

* Make HW config parsing conform to the implicit rules (#111)

(except for the "any supported quantization for the ops in config
without specified quantizations", because they need config wildcarding,
to be implemented as a follow-up)

* Fix MobileNetV2 INT8 config (#113)

* Use sequential sampling for evaluation across example scripts (#114)

Hopefully this will make nightly compression training "eval" tests
more stable.

* Fix third_party_sanity tests (#115)

* Properly handle ops in HW config without quantization configs associated (#119)

These get associated with a "wildcard" propagating quantizer, which
will either get merged with any other quantizer during propagation,
or get assigned a default quantization config.

* Make criterion optional in signature of register_default_init_args() (#121)

* make optional criterion in signature of register_default_init_args()

* update README.md as Vasiliy asked

* Add Googlenet with pruning configs  (#122)

* Fix pretrained (#125)

* Mark Convs as non-depthwise for 1 input channel case (#126)

* Add non-RELU activations to fusable patterns (#124)

* Fixed Pylint warnings (#129)

* Fix bug with CompositeCompressionAlgorithmController export_model() signature (#132)

* Add per layer initialization of  ranges. (#116)

* Add prepare_for_export() to commit pre export for CompressionAlgortihmController; Update for CompositeCompressionAlgorithmController (#138)

* Fix PyLint. (#139)

* Introduced compression ratio parameter for Mixed Precision init (#133)

* Introduced compression ratio parameter for Mixed Precision init

It's used for choosing optimal mixed precision configuration for a given ratio.

Compression ratio of mixed precision quantization is calculated by relation to fully INT8 one.
Total compression for the model is sum of compression for each quantized layer, which is multiplication the layer's (Conv, Deconv, Linear) FLOPS and number of bits for its quantization. The ratio is used for estimation of performance boost for quantized model It's a better proxy for amount of calculation then number of parameters multiplied by bitwidth

* Added link to the full configuration file with template usage

* disclaimer about model specific params in template

* corrected articles, contractions, mixed precision-> mixed-precision

* Fix bug with NoCompressionAlgorithmController (#150)

* Set data loading workers to 0 across tests to force single process (#162)

* Set data loading workers to 0 across tests to force single process

Could fix the consequences of https://github.com/pytorch/pytorch/issues/39570

* Remove more-itertools dependency

* Specify NNCF import order in docs (#161)

* Specify NNCF import order in docs

* Fix frontpage integration instructions

* Bump mmdetection version to 2.4.0 (#166)

* Fix command line creation for test_compression_training (#167)

* Improve eval test code (#160)

* Fix bug with different torch devices in get_scale_zp_from_input_low_input_high (#158)

* Fix third_party_sanity and eval test bugs (#169)

* Fix mmdetection dataset search path for SSD (#176)

* Test stability (#179)

* Increase eval threshold for test_compression_training cases

CUDA computation seems to inherently cause differences of at least
0.01% in accuracy metric computation between the train and eval
runs

* Reduce batch size for SSD512 eval CI runs (avoid OOM)

* Renamings (#178)

* Fixed disabling gradients of quantizers for HAWQ (#184)

* Corrected default values in range initializers (#183)

- Right minimal and maximum values for mean_min_max doesn't skip check for not collected statistics and prevents from initializing by inf values.
- Percentile init doesn't crash by default

* Refactor imports in setup.py (#182)

Important for CI

* Fix security issues with imports (#185)

* Fix paths to COCO in mmdetection third party sanity tests (#186)

* Build graphs within the torch.no_grad() context (#187)

Should reduce memory usage during create_compressed_model

* Fix security issues directly in code (#189)

* Return zero-valued torch.Tensor in CompressionLoss by default instead of int (#190)

* Make default install support non-GPU cases (#193)

* Fixed backward compatibility test (#195)

* Improve quantizer setup for hanging batchnorm nodes (#192)

* Do not merge subgraphs if subgraph has more than one output node

* Mark BatchNorm as INPUTS_QUANTIZABLE by default

Will manifest itself in case there is a batch norm operation that
was not merged to any previous op, i.e. should accept quantized
input instead of FP32

* Fix export for nodes with metatypes not redefined by pruning algo (#171)

* Add more security fixes (#197)

* Removed double logging to stdout (#198)

* ignore frozen layers during filter pruning (#200)

* Use latest matplotlib version (#206)

* Use propagation based mode by default (#181)

* Set propagation_based mode by default.

* Fix compressed graphs.

* Fix quantize inputs  option.

* Add operator metatypes for 'sigmoid' and 'add' operator (#209)

* Add operator metatypes for 'sigmoid' and 'add' operator

* remove trailing spaces

Co-authored-by: Chua, Vui Seng <[email protected]>

* Introduced `enabled` parameter for Quantizers (#194)

Also:
* corrected script to add new quantization parameters to checkpoints
* added warning on exporting disabled quantizations
* print statistics about enabled quantizers by default

* Update documentation (#219)

* Update documentation.

* Update docs. Add dependencies for param to json schema.

* To fix cpu_only part (#221)

* To update cpu_only part dockerfile; fix issue with setup.py install with --cpy-only opt; fix README.md

* apply remarks

* Fix register_operator (#224)

* Add per-layer sparsity. (#127)

* Do not call _quantize_inputs for propagation based mode (#229)

* Consistent bitwidth for activations and weight in propagation mode (#191)

* Added sota eval tests via AC (#142)

* Refactored HAWQ: split functionality into separate files (#232)

* Allow quantizing modules that share their weights for multiple operations (#235)

* Filter quantizers that directly act upon integer inputs (#228)

* Add support sparsity freeze epoch for magnitude sparsity. (#218)

* Liberal bitwidth assignment mode by default on precision initialization (#222)

* Fix AdaptiveSparsityScheduler. (#236)

* Fix threesigma init (#240)

* Build extensions in a temporary folder (#239)

* Criterion generalization for HAWQ algorithm (#230)

* Criterion generalization for HAWQ algorithm

* scope_node -> node_scope

* Documentation update

* Described in docs when to use additional parameter 'criterion_fn'

* fix quantization range initialization in case of 1 scale channel (#241)

fix quantization range initialization in case of 1 scale channel to avoid initialization only by single slice of data (data[0]) and ignoring the other (data[1], data[2],.....)

* Patch Semantic Segmentation Application to export onnx and test with resume flag (#244)

Co-authored-by: Chua, Vui Seng <[email protected]>

* Add DW-conv to input quantizable op. (#220)

* Fixed skip Openvino tests and preinstall (#246)

* Corrected handling of barrier on the graph traverse (#249)

* Extend input handling flexibility (#242)

* Handle inputs better using input_infos

* Update nncf/model_creation.py

* Corrected handling Inception outputs in classification sample (#251)

* Change quantization levels for SymmetricQuantizer from 255 to 256 (#225)

* Change quantization levels for SymmetricQuantizer from 255 to 256

* Update test_functions with new level

* Fix bug with weights range, Make formulas dependent only from one value - levels, thereby reducing the chance to make a mistake

* Fix PyLint

* Update HW configs with new quantization level_low

* Fix bug with float type

* Change type() to isinstance()

* Change return values order in calculate_level_ranges

* Fix bug with export to Q/DQ (#248)

* Fix bug with export to Q/DQ

Add hack of export processing for our old checkpoints
Add Exception raising for exporting per-channel Q/DQ layers, as PyTorch
ONNX exporting supports only per-tensor.

* Fix Pylint

* Update layers.py

* Fix bug in AssymetricQuantizer export; Add tests

* Fix pylint

* Fix bug in AssymetricQuantizer export; Add tests

* Fix pylint

Co-authored-by: Vasily Shamporov <[email protected]>

* Update results and links to the checkpoints (#253)

* Update documentation for release v1.5.0 (#252)

* Update documentation for release v1.5.0

* Corrected HAWQ documentation

* Add per-range initialization notes

Co-authored-by: Lyalyushkin Nikolay <[email protected]>

* Add Mask-RCNN-R50FPN-INT8 config for mmdetection (#174)

* rebase

* add third-party sanity tests for Mask-RCNN IS model

* add Mask-RCNN accuracy results to tables

* fix link in README

* add instance segmentation ref to README

* fix voc path

* fix retinanet config

* Update version.py

Co-authored-by: Ivan Lazarevich <[email protected]>
Co-authored-by: Pave Finashov <[email protected]>
Co-authored-by: Anastasia Senina <[email protected]>
Co-authored-by: Aleksei Kashapov <[email protected]>
Co-authored-by: Maria Kaglinskaya <[email protected]>
Co-authored-by: Lyalyushkin Nikolay <[email protected]>
Co-authored-by: vuiseng9 <[email protected]>
Co-authored-by: Chua, Vui Seng <[email protected]>
Co-authored-by: Fyodor Kutsepin (aka Oddy O) <[email protected]>
Co-authored-by: krodyush <[email protected]>

v1.4.1

28 Jul 17:11
Compare
Choose a tag to compare

Fixed packaging

v1.4

28 Jul 14:03
07f56d9
Compare
Choose a tag to compare
Forced CI run for NNCF release v1.4 (#74)

v1.3.2

08 Jun 11:43
9fc4a92
Compare
Choose a tag to compare

Documentation updates

v1.3.1

03 Jun 14:32
3a11f3d
Compare
Choose a tag to compare
Release v1.3.1 of NNCF on Github