Skip to content

Releases: microsoft/Olive

Olive-ai 0.8.0

17 Mar 22:14
Compare
Choose a tag to compare

New Features (Passes)

  • QuaRot performs offline weight rotation
  • SpinQuant performs offline weight rotation
  • StaticLLM converts dynamic shaped llm into a static shaped llm for NPUs.
  • GraphSurgeries applies surgeries to ONNX model. Surgeries are modular and individually configurable.
  • LoHa, LoKr and DoRA finetuning
  • OnnxQuantizationPreprocess applies quantization preprocessing.
  • EPContextBinaryGenerator creates EP specific context binary onnx models.
  • ComposeOnnxModels composes split onnx models.
  • OnnxIOFloat16ToFloat32 replaced with more generic OnnxIODataTypeConverter

Command Line Interface

New command line tools have been added and existing tools have been improved.

  • generate_config_file option to save the workflow config file.
  • extract-adapters command to extract multiple adapters from a PyTorch model.
  • Simplied quantize command

Improvements

  • Better output model structure for workflow and CLI runs.
    • New no_artifacts options in workflow config to disable saving run artifacts such as footprints.
  • Hf data preprocessing:
    • Dataset is truncated if max_samples is set.
    • Empty text are filtered.
    • padding_side is configurable and defaults to "right".
  • SplitModel pass keeps QDQ nodes together in the same split.
  • OnnxPeepholeOptimizer: constant folding + onnxoptimizer added.
  • CaptureSplitInfo: Separate split for memory intensive module.
  • OnnxConversion:
    • Dynamic shapes for dynamo export.
    • optimize option to perform constant folding and redundancies elimination on dynamo exported model.
  • GPTQ: Default wikitest calibration dataset. Patch to support newer versions of transformers.
  • MatMulNBitsToQDQ: nodes_to_exclude option.
  • SplitModel: split_assignments option to provide custom split assignments.
  • CaptureSplitInfo: block_to_split can be a single block (str) or multiple blocks (list).
  • OnnxMatMul4Quantizer: Support onnxruntime 1.18+
  • OnnxQuantization:
    • Support onnxruntime 1.18+.
    • op_types_to_exclude option.
    • LLMAugmentedDataLoader augments the calibration data for llms with kv cache and other missing inputs.
  • New document theme and organization.
  • Reimplement search logic to include passes in search space.

Examples:

  • New QNN EP examples:
    • SLMs:
      • Phi-3.5
      • Deepseek R1 Distill
      • Llama 3.2
    • MobileNet
    • ResNet
    • CLIP VIT
    • BAAI/bge-small-en-v1.5
    • Table Transformer Detection
    • adetailer
  • Deepseek R1 Distill Finetuning
  • timm MobileNet

Olive-ai 0.7.1.1

14 Nov 19:39
Compare
Choose a tag to compare

Same as 0.7.1 with updated dependencies for nvmo extra and NVIDIA TensorRT Model Optimizer example doc.

Refer 0.7.1 Release Notes for other details.

Olive-ai 0.7.1

12 Nov 20:57
Compare
Choose a tag to compare

Command Line Interface

New command line tools have been added and existing tools have been improved.

  • olive --help works as expected.
  • auto-opt:
    • The command chooses a set of passes compatible with the provided model type, precision and accelerator information.
    • New options to split a model, either using --num-splits or --cost-model.

Improvements

  • ExtractAdapters:
    • Support lora adapter nodes in Stable Diffusion unet or text-embedding models.
    • Default initializers for quantized adapter to run the model without adapter inputs.
  • GPTQ:
    • Avoid saving unused bias weights (all zeros).
    • Set use_exllama to False by default to allow exporting and fine-tuning external GPTQ checkpoints.
  • AWQ: Patch autoawq to run quantization on newer transformers versions.
  • Atomic SharedCache operations
  • New CaptureSplitInfo and Split passes to split models into components. Number of splits can be user provided or inferred from a cost model.
  • disable_search is deprecated from pass configuration in an olive workflow config.
  • OrtSessionParamsTuning redone to use olive search features.
  • OrtModelOptimizer renamed to OrtPeepholeOptimizer and some bug fixes.

Examples:

  • Stable Diffusion: New MultiLora Example
  • Phi3: New int quantization example using nvidia-modelopt

Olive-ai 0.7.0

16 Oct 23:00
Compare
Choose a tag to compare

Command Line Interface (CLI)

Introducing new command line interface for Olive with support to execute well-defined concrete workflows without user having to ever create or edit a config manually. CLI workflow commands can be chained i.e. output of one execution can be fed as input to the next, to facilitate ease of operations for the entire pipeline. Below is a list of few CLI workflow commands -

  • finetune - Fine-tune a model on a dataset using peft and optimize the model for ONNX Runtime
  • capture-onnx-graph: Capture ONNX graph for a Huggingface model.
  • auto-opt: Automatically optimize a model for performance.
  • quantize: Quantize model using given algorithm for desired precision and target.
  • tune-session-params: Automatically tune the session parameters for a ONNX model.
  • generate-adapter: Generate ONNX model with adapters as inputs.

Improvements

  • Added support for yaml based workflow config
  • Streamlined DataConfig management
  • Simplified workflow configuration
  • Added shared cache support for intermediate models and supporting data files
  • Added QuaRoT quantization pass for PyTorch models
  • Added support to evaluate generative PyTorch models
  • Streamlined support for user-defined evaluators
  • Enabled use of llm-evaluation-harness for generative model evaluations

Examples

  • Llama
    • Updated multi-lora example to use ORT genreate() API
    • Updated to demonstrate use of shared cache
  • Phi3
    • Updated to demonstrate evaluation using lm-eval harness
    • Updated to showcase search across three different QLoRA ranks
    • Added Vision tutorial

Olive-ai 0.6.2

11 Jun 06:52
Compare
Choose a tag to compare

Workflow config

  • Support YAML files as workflow config file. #1191
  • Workflow id feature is a prerequisite for running workflow on a remote vm feature. By adding this feature #1179 :
    • Cache dir will become <cache_dir>/<workflow_id>
    • OLive config will be automatically saved to cache dir.
    • User can specify workflow_id in config file.
    • The default workflow_id is default_workflow.

Passes (optimization techniques)

  • Accept SNPE DLC model for qnn context binnary generator #1188

Data

  • Remove params_config, components/component_args. All components specific parameters are now grouped in four separate objects: #1187
    • load_dataset_config
    • pre_process_data_config
    • post_process_data_config
    • dataloader_config

Docs

  • Add olive workflow schema to doc website. This schema file can be used in IDEs when writing workflow configs. #1190

Olive-ai 0.6.1

30 May 06:49
Compare
Choose a tag to compare

Example

  • Phi3 AzureML example. #1171

Passes (optimization techniques)

  • Pytorch
    • OnnxQuantization : Complete the qnn-ep related config items to support new features from onnxruntime-1.18

Data

  • Deprecate unused field DataComponentConfig::name #1178

Olive-ai 0.6.0

15 May 11:13
Compare
Choose a tag to compare

Examples

The following examples are added:

Olive CLI updates

  • Previous commands python -m olive.workflows.run and python -m olive.platform_sdk.qualcomm.configure are deprecated. Use olive run or python -m olive instead. #1129

Passes (optimization techniques)

  • Pytorch
    • AutoAWQQuantizer Enable AutoAwq in Olive and provides the capbility for onnx conversion #1080
    • SliceGPT: Add support for generic data sets to SliceGPT pass #1145
  • ONNXRuntime
    • ExtractAdapters pass supports int4 quantized models and expose the external data config options to users. #1083
    • ModelBuilder: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154
    • OnnxFloatToFloat16: Use ort float16 converter #1132
    • NVModelOptQuantization Quantize ONNX model with Nvidia-ModelOpt. #1135
    • OnnxIOFloat16ToFloat32: Converts float16 model inputs/outputs to float32. #1149
    • [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140

Data Config

  • Remove name ambiguity in dataset configuration #1111
  • Remove HfConfig::dataset references in examples and tests #1113

Engine

  • Add aml deployment packaging. #1090

System

  • Make the accelerator EP optional in olive systems for non-onnx pass. #1072

Data

  • Add AML resource support for data configs.
  • Add audio classification data preprocess function.

Model

  • Provide build-in kv_cache_config for generative model's io_config #1121
  • MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150

Metrics

Dependencies:

Support onnxruntime 1.17.3

Issues

  1. Fix code scanning issues. #1078 #1081 #1084 #1085 #1091 #1094 #1103 #1104 #1107 #1126 #1124 #1128

Olive-ai 0.5.2

11 Apr 05:56
Compare
Choose a tag to compare

Examples

The following examples are added

  • Phi2 SliceGPT example #1052
  • Phi2 Genai example. #1061
  • Llama ExtractAdapters example. #1064

Passes (optimization techniques)

  • SliceGPT: SliceGPT is post-training sparsification scheme that makes transformer networks smaller by applying orthogonal transformations to each transformer layer that reduces the model size by slicing off the least-significant rows and columns of the weight matrices. This results in speedups and a reduced memory footprint.
  • ExtractAdapters: Extracts the lora adapters (float or static quantized) weights and saves them in a separate file.

Engine

  • Simplify the engine config

Fix

  • GenAIModelExporter: In windows, the cache_dir of genai model exporter will exceed 260.

Olive-ai 0.5.1

07 Apr 08:18
Compare
Choose a tag to compare

Examples

The following examples are added

  • Mistral FP16. #980
  • Phi2 Fine tuning example. #1030

Passes (optimization techniques)

  • QNNPreprocess: Add the configs which are added in onnxruntime nightly package.
  • GptqQuantizer: PTQ quantization using Hugging Face Optimum and export model with onnxruntime optimized kernel.
  • OnnxMatMul4Quantizer: Add matmul RTN/HQQ/GPTQ quant configs.
  • Move all pass need create inference session to run on target:
    • IncQuantization
    • OptimumMerging
    • OrtTransformersOptimization
    • VitisAIQuantization
    • OrtPerfTuning

Engine

  • Support to pack AzureML output.
  • Remove execution_providers from engine config, typical config looks like:
"systems": {
    "local_system": {
        "type": "LocalSystem",
        "config": {
            "accelerators": [
                {
                    "device": "gpu",
                    "execution_providers": [
                        "CUDAExecutionProvider"
                    ]
                }
            ]
        }
    }
},
"engine": {
      "host": "local_system",
      "target": "local_system",
}

Workflows

  • Delayed python pass module loading and provide the option --package-config to let advanced users to write their individual pass module and corresponding dependencies.

Fix

  • Cannot load MLFlow model as from_pretrained_args is missed.
  • LoRA: Provide save_embedding_layers=False to saving the peft model. Otherwise, it defaults to "auto" which checks if the vocab size changed.
  • Update the model_rank file for zipfile packaging type. The model path now is the path relative to the output zip file.
  • Fix windows shutil.which return None when passing full python path.

Olive-ai 0.5.0

07 Mar 23:48
Compare
Choose a tag to compare

Examples

The following examples are added:

Passes (optimization techniques)

New Passes

  • PyTorch
    • Introduce GenAIModelExporter pass to export a PyTorch model using GenAI exporter.
    • Introduce LoftQ pass which performs model fine-tuning using the LoftQ initialization proposed in https://arxiv.org/abs/2310.08659.
  • ONNXRuntime
    • Introduce DynamicToFixedShape pass to convert dynamic shape to fixed shape for ONNX model.
    • Introduce OnnxOpVersionConversion pass to convert an existing ONNX model with another target opset.
    • [QNN-EP] Add the option of prepare_qnn_config:bool for quantization under QNN-EP where the int16/uint16 are supported both for weights and activation.
    • [QNN-EP] Introduce QNNPreprocess pass to preprocess the model before quantization.
  • QNN
    • Introduce QNNConversion pass to convert models to QNN C++ model.
    • Introduce QNNContextBinaryGenerator pass to generate the context binary from a compiled model library using a specific backend.
    • Introduce QNNModelLibGenerator pass to compile the C++ model into a model library for the desired target.

Updates

  • OnnxConversion
    • Support both past_key_values.index.key/value and past_key_value.index.
  • OptimumConversion
    • Provide parameter components if the user wants to export only some models such as decoder_model and decoder_with_past_model.
    • Uses the default exporter args and behavior of the underlying optimum version. For versions 1.14.0+, this means legacy=False and no_post_process=False. User must provide them using extra_args if legacy behavior is desired.
  • OpenVINO
    • Upgrade OpenVINO API to 2023.2.0.
  • OrtPerTuning
    • Add tunable_op_enable and tunable_op_tuning_enable for ROCM ep to speed up the performance.
  • LoRA/QLoRA
    • Support bfloat16 with ort-training.
    • Support resuming training from checkpoint by
      • resume_from_checkpoint option.
      • overwrite_output_dir option.
  • MoEExpertsDistributor
    • Add option to configure number of parallel jobs.

Engine

  • As for Zipfile packaging, add models rank json file. This file ranks all output models from different EPs. This json file includes model_config and metrics.
  • Add Auto Optimizer which is a tool that can be used to automatically search Olive passes combination.

System

  • Add hf_token support for Olive systems.
  • AzureMLSystem
    • Olive config file will be uploaded to AML jobs under codes folder.
    • Support adding tags to the AML jobs.
    • Support using existing AML workspace Environment for AzureMLSystem.
  • DockerSystem
    • Support running Olive Pass.
  • PythonEnvironmentSystem requires Olive to be installed in the environment. It can run passes and evaluate models.
  • New IsolatedORTSystem introduced that only supports evaluation of ONNX models. It requires onnxruntime to be installed in the environment. Can be used to for packages like onnxruntime-qnn which can only be run on Windows ARM64 python environment.

Data

  • Add AML resource support for data configs.
  • Add audio classification data preprocess function.

Model

  • Rename model_loading_args to from_pretrained_args in hf_config.

Metrics

  • Add throughput metric support.

Dependencies:

Support onnxruntime 1.17.1.