Dates are in YYYY-MM-DD format.
- Added
DataType.FLOAT4
for 4-bit floats (E2M1).
- Added option to emit logs using python
logging
module.
- Added
runtime_platform
toCreateConfig
for TensorRT and corresponding--runtime-platform
command-line option.
- Added TensorRT 10.1 weight streaming V2 APIs.
- Added TensorRT 10.1 runtime device memory V2 APIs.
- Changed the meaning of the
TrtRunner
's weight streaming budget argument.
- Added an
EngineFromPath
loader to deserialize an engine directly from disk. This will save CPU memory when weight streaming is enabled.
- Fixed a memory leak in
TrtRunner
caused by creating a new output allocator per inference. - Fixed a bug where the
Calibrator
would not force non-index inputs to FP32; this is required by TensorRT.
- Added
run_opts
argument totools.main
to allow calling polygraphy tools from within other Python scripts.
- Updated weight streaming flag to accept a percentage.
- Fixed a bug where a large amount of mismatch between two runner outputs would generate an out of memory error.
- Added
--mark-debug
command-line option. - Initializes trt plugin library by default when using the TensorRT ONNX parser
- Added
plugin match
subtool that finds opportunities for plugin substitution in an ONNX Model and prepares an intermediate file to be used for actual substitution - Added
plugin list
subtool that lists opportunities for plugin substitution, without preparing an intermediate file. - Added support for building engines with the refittable weights stripped.
Setting the
strip_plan
parameter ofCreateConfig
or passing in the--strip-plan
flag enables building engines with the refittable weights stripped. - Added
plugin replace
subtool that replaces subgraphs with plugins, based on an intermediate file (config.yaml) - Added
polygraphy surgeon weight-strip
to strip the initializers of selective nodes in an ONNX model - Added
polygraphy surgeon weight-reconstruct
to read a weightless ONNX model and fill the empty initializers with proxy tensors - Added '--weight-streaming
and
--weight-streaming-budget` APIs to control TRT weight streaming
- Fixed a bug where
explicit_batch
would be provided by default on TRT 10.0, where it has been removed.
- Added an
allocation_strategy
toTrtRunner
and corresponding--allocation-strategy
argument to CLI tools.
- Fixed a bug where the reference count of the TensorRT engine would not be decremented correctly in
TrtRunner
.
- Fixed a bug where the comparator would modify the original output tensor in some cases instead of operating on a copy.
- Fixed a bug where
is_installed()
for lazily imported modules would not work if the package name differed from the module name.
- Improved error messages in the default data loader for invalid backend modules.
- Added
DataType.INT4
for 4-bit signed integers.
- Removed internal usage of several deprecated TensorRT APIs.
- Fixed a bug in the default data loader where scalars would not be generated correctly.
- Added
--profiling-verbosity
command-line option. - Added a
progress_monitor
parameter toCreateConfig
. - Added a
data_loader_backend_module
parameter toDataLoader
and corresponding--data-loader-backend-module
argument to CLI tools to choose between generatingnumpy.ndarray
andtorch.tensor
in the default dataloader.
- Fixed a bug where warnings would be issued for unsupported versions of
torch
even iftorch
was not being used.
- Added
check lint
subtool that validates ONNX Models and generates human-readable console output and a JSON report detailing unused or invalid nodes as well as model errors. - Added a new
inspect sparsity
subtool that can check whether the weights in a model are sparse. - Added a
disable_compilation_cache
parameter toCreateConfig
and corresponding--disable-compilation-cache
argument to CLI tools. - Added a
"quantile"
mode toCompareFunc.simple()
'scheck_error_stat
parameter. - Added an
error_quantile
parameter toCompareFunc.simple()
and corresponding--error-quantile
argument to CLI tools to specify the error quantile whencheck_error_stat="quantile"
.
- Improved error messages when deserialization fails due to a missing module.
- Changed
SaveOnnx
to useexternal_data_path=""
when the model exceeds the protobuf size limit and no external data path is provided. This prevents scenarios where a long running command likesurgeon sanitize --fold-constants
would finally complete only to fail when attempting to save the final model. With the new behavior, the model will be saved successfully with external data in a default location.
- Updated default
DataLoader
to show better errors when it can't generate inputs due to data types unsupported by NumPy. In such cases, you must provide a custom data loader.
- Fixed a bug where older versions of ONNX would cause failures due to missing data types.
- Fixed a bug where top-K implementation would not work for PyTorch FP16 tensors on CPU.
- Added a
quantization_flags
parameter toCreateConfig
and corresponding--quantization-flags
argument to CLI tools to enable setting TensorRT builder quantization flags. - Added a
error_on_timing_cache_miss
parameter toCreateConfig
and corresponding--error-on-timing-cache-miss
argument to CLI tools. - Added a
bf16
option to the TensorRTCreateConfig
loader and corresponding--bf16
argument to CLI tools. - Added a common
DataType
class which can convert between data type classes of various other frameworks, like NumPy, PyTorch, and TensorRT. - Added support for PyTorch tensors in
TrtRunner
andCalibrator
. See the example for details. - Added support for PyTorch tensors in
OnnxrtRunner
. - Added a
strongly_typed
option to TensorRT network loaders and a corresponding--strongly-typed
argument to CLI tools. - Added
polygraphy surgeon prune
to prune a model to be sparse. Note that this will not retain the accuracy of the model and should hence be used only for functional testing. - Added a
--toposort
option tosurgeon sanitize
to topologically sort nodes.
TensorMetadata
will now automatically convert data types to Polygraphy'sDataType
class. The data types can be converted to corresponding NumPy types using thenumpy()
method. This affects any interface that returns instances of this class. For example, theCalibrator
sets input metadata on the provided data loader in the form of aTensorMetadata
instance. NOTE: For compatibility, the runner method,get_input_metadata
will retain its previous behavior of using NumPy types. In a future version of Polygraphy, it will be updated to return only PolygraphyDataType
s.
- Fixed a bug where the
DataLoader
would not generate inputs correctly if a scalar was provided in the input metadata. Note that since scalars have fixed shapes, they do need to be specified via theinput_metadata
argument, so a workaround on older versions is to simply omit scalars.
- Removed the TensorRT Legacy runner which supported UFF and Caffe models.
- Removed support for TensorRT versions older than 8.5.
- Removed
max_workspace_size
option in TensorRT'sCreateConfig
and corresponding--workspace
argument from CLI tools. - Removed
strict_types
option in TensorRT'sCreateConfig
and corresponding--strict-types
argument from CLI tools. - Removed
debug diff-tactics
alias forinspect diff-tactics
.diff-tactics
is now available only under theinspect
tool.
- Updated
TrtOnnxFlagArgs
to automatically enableNATIVE_INSTANCENORM
when either hardware or version compatibility is enabled in the builder configuration. - Downgraded errors for extra layers/tensors in
SetLayerPrecisions
,SetTensorDatatypes
, andSetTensorFormats
to warnings.
- Added experimental support for error heatmaps. These can be visualized and/or saved with the
--show-heatmaps
/--save-heatmaps
command-line options orshow_heatmaps
/save_heatmaps
arguments toCompareFunc.simple
. - Added experimental
--show-error-metrics-plot/--save-error-metrics-plot
command-line options and correspondingshow_error_metrics_plot
/save_error_metrics_plot
arguments toCompareFunc.simple
. These allow you to generate plots of error vs. magnitude. - Added
--version-compatible
flag for building version-compatible engines. Note that for building version compatible engines for ONNX models,--onnx-flags native_instancenorm
must also be provided. - Added
TrtSaveEngineBytesArgs
andTrtLoadEngineBytesArgs
to allow for avoiding engine deserialization until necessary. - Added an
exclude_lean_runtime
parameter toCreateConfig
and corresponding--exclude-lean-runtime
CLI option. - Added a
runtime
parameter toEngineFromBytes
andEngineFromNetwork
to enable deserializing plans with a custom runtime. - Added a
LoadRuntime
TensorRT loader that can be used to load a runtime from a path and a corresponding--load-runtime
CLI option.
- Updated Polygraphy to warn when it detects unsupported TensorRT and NumPy version combinations.
TrtSaveEngineArgs
andTrtLoadEngineArgs
now depend onTrtSaveEngineBytesArgs
andTrtLoadEngineBytesArgs
respectively. Additionally, all command-line options have been migrated to the latter argument groups.
- Fixed a bug in
debug precision
where Polygraphy would attempt to set the layer precision for layers producing non-activation outputs, which is an error in TensorRT.
- Fixed minor formatting issues in help text
CompareFunc.simple
will now add a small epsilon when computing relative error to avoid Inf/NaNs.
polygraphy run
will now print warnings when command-line options are provided for comparison functions types other than the current one specified by--compare-func
.- Added a
TensorInfo
class to the TensorRT backend to track information fromIAlgorithmIOInfo
. TheAlgorithm
class now keepsTensorInfo
s instead of tuples. - Changed the format of tactic replay files to include more information about tensor formats where possible. NOTE: This means that tactic replay files generated with previous versions of Polygraphy are not compatible with this version!
- Fixed a bug where the
--trt-legacy
runner would not work with--input-shapes
specified. - Fixed a bug where
debug reduce
would not work correctly for models where a node had multiple outputs which were also graph outputs. See the comment inreduce.py
for details.
- Updated comparison functions so that the output array is now displayed in addition to the histogram rather than instead of it.
- Updated comparison functions to display the entire output array instead of a histogram if it is small enough.
- Added
max_aux_streams
toCreateConfig
for TensorRT and corresponding--max-aux-streams
command-line option. - Added support for HWC I/O formats in
TrtRunner
for TensorRT 8.6+.
- Added an
-n/--num-items
option toinspect data
to control how many elements of an array are shown. - Added a
--line-width
option toinspect data
to control how many characters are displayed per line when showing an array.
- Fixed a bug where
CreateConfig
was not included in the API documentation. - Reimplemented calibrator restriction so inputs always use
OPT
shape as TensorRT does not currently support using other shapes during calibration.
- Removed various deprecated APIs:
input_metadata
parameter inCalibrator.reset()
. Now,Calibrator.set_input_metadata()
implements this functionality and generally does not need to be called manually.version
parameter inmod.lazy_import()
. The version can now be specified as part of the package name. For example,tensorrt>=8.0
.--timing-cache
CLI argument. This is replaced by--load-timing-cache
and--save-timing-cache
.- Legacy CLI shape syntax. See the CHANGELOG entry for v0.32.0 for details.
- Added
fp8
parameter toCreateConfig
for TensorRT and corresponding--fp8
command-line option. - Added
hardware_compatibility_level
toCreateConfig
for TensorRT and corresponding--hardware-compatibility-level
command-line option.
- Fixed a bug where the calibrator would not accept inputs with a shape other than the
OPT
shape set in the profile.
- Fixed a bug where paths on Windows including drive letters would not be parsed correctly.
- Added an experimental
PostprocessConfig
loader to edit the TensorRTIBuilderConfig
s generated by Polygraphy and a corresponding CLI option,--trt-config-postprocess-script
.
- Updated
TrtRunner
and associated utilities to use the TensorRT v3 inference APIs exclusively when available. Previous changes only used the v3 APIs in theinfer()
method. - Updated all arguments that take a script to allow the function name to be specified with the script argument instead
of with a separate option. For example, you can now write:
--data-loader-script my_script.py:my_func
instead of--data-loader-script my_script.py --data-loader-func-name my_func
.
- Fixed a bug where
TrtRunner.get_input_metadata()
would crash in older versions of TensorRT when no optimization profile was set. Now, an error message is emitted instead. - Fixed an issue where immediately evaluated loaders would display a generic function name in error messages instead of the name of the loader being called.
- Fixed a bug where setting
preview_features
inCreateConfig
would not disable default preview features even when they were not part of the provided preview feature list.
- Fixed a bug due to API breakages in
importlib_metadata
.
- Added a new
template onnx-gs
subtool that allows you to generate template files for Python scripts to modify an ONNX model using ONNX-GraphSurgeon. - Added a new
PostprocessNetwork
loader and corresponding--trt-network-postprocess-script
command-line flag to allow for postprocessing a loaded TensorRT network using a user-defined function.
- Updated the working with run results example to demonstrate how to work with saved input data as well.
- Updated CLI subtools to report duration of the command by default.
- Improved error messages in the
ExtractSubgraph
loader and consequentlysurgeon extract
when the extracted subgraph is invalid due to tensors without producers not marked as graph inputs. - Updated the adding precision constraints run example
to demonstrate how to use the
--trt-postprocess-network
flag to apply precision constraints to a parsed TensorRT network.
- Fixed a bug where specifying the same runner multiple times would have no effect.
- Fixed a bug where
Script
would closestdout
after writing to it. - Fixed a bug where an API change in
onnxmltools
caused a breakage in theConvertToFp16
loader. - Fixed a bug where
debug precision
would not respect the layer precisions set by--layer-precisions
.
- Added a how-to guide on using
debug
subtools effectively. - Added a how-to guide on using
debug reduce
effectively and refactored thedebug reduce
example README to omit information that's now captured in the how-to guide. - Added a how-to guide on using custom inference input data.
- Moved the
diff-tactics
tool toinspect
.debug diff-tactics
is now an alias forinspect diff-tactics
to preserve backwards compatibility. - Updated command-line parsing logic to handle
inf
,-inf
, andnan
. These can now be specified to options that expect floating point numbers, such as--val-range
.
- Fixed a bug where
Profile.fill_defaults
would not work correctly for scalar shape-tensor network inputs. - Fixed a bug where
lazy_import
would repeatedly callimportlib.import_module
.
- Updated Polygraphy's TensorRT algorithm selectors to keep track of strides in addition to tensor formats and data types.
- Fixed a bug where marking output tensors in a TensorRT network containing layers with omitted optional inputs would raise an exception.
- Added a
SetTensorDatatypes
loader and corresponding--tensor-datatypes
command-line flag to allow for setting per-tensor data types in TensorRT networks. - Added a
direct_io
parameter toCreateConfig
and corresponding--direct-io
command-line flag. - Added a
SetTensorFormats
loader and corresponding--tensor-formats
command-line flag to allow for setting per-tensorallowed_formats
in TensorRT networks. - Added an experimental, untested API,
FormattedArray
to allowTrtRunner
to support vectorized formats. This API will be documented, tested, and improved in a future release. Until then, use with caution!
- Updated
InferShapes
to use ONNX-Runtime's shape inference utilities where possible, as they are more performant and memory efficient than the ONNX shape inference utilities. This new behavior, which is on by default, can be disabled by settingallow_onnxruntime=False
, or the--no-onnxruntime-shape-inference
CLI option.
- Fixed a bug in
inspect model
where dimensions of ONNX tensors withoutdim_value
ordim_param
set would be shown as0
s instead of-1
s.
- Removed the
explicit_precision
parameter from TensorRT network loaders.
- Updated
surgeon extract
anddebug reduce
to no longer attempt to retrieve values for all intermediate tensors when running fallback shape inference. Instead, only the minimum required tensors are retrieved. This greatly reduces the memory requirements for large models where fallback shape inference is required.
- Fixed a bug where the logger would display incorrect file paths and line numbers when
line_info
was enabled.
- Added a new example showing how to work with
RunResults
using the API. - Added a
size_threshold
option inFoldConstants
and corersponding--fold-size-threshold
CLI option, which allows for skipping constant folding on operations which would generate constants larger than the provided threshold. - Added support for data-dependent shapes with TensorRT 8.5 by updating
TrtRunner
to use the newexecute_async_v3
and related APIs when possible.
- Updated the
Calibrator
with aset_input_metadata()
method and deprecated theinput_metadata
parameter inreset()
. - Updated
CreateConfig
to no longer implicitlyreset()
the calibrator. Resetting is only required if the calibrator is used across multiple different networks. - Updated
EngineFromNetwork
and related loaders to set input metadata on Polygraphy calibrators. This way, metadata is set even when using a TensorRTIBuilderConfig
created outside Polygraphy.
- Removed support for ONNX-GraphSurgeon
0.3.20
and older inFoldConstants
. - Removed support for TensorRT 6.0 and earlier.
- Updated the logger to perform less processing on messages to speed up logging.
- Fixed an issue where
ctrl-C
would be caught and ignored by TensorRT's logger. Polygraphy will now generate aSIGTERM
signal when aKeyboardInterrupt
is triggered.
- Added a
SetLayerPrecisions
loader to set layer compute precisions in TensorRT networks and corresponding--layer-precisions
CLI option. - Added an
ignore_external_data
option toOnnxFromPath
and corresponding--ignore-external-data
option so that it's still possible to manipulate the model to some degree if external weights are missing.
- Updated Polygraphy's TensorRT logger implementation for TensorRT 8.0 and newer to redirect to the Polygraphy global logger. This enables Polygraphy logging features like colored output and redirection to a log file.
- Fixed a bug where the context manager created by
G_LOGGER.verbosity()
would not correctly revert the logging verbosity to its original value.
- Added per-path logging verbosity settings and a corresponding
--verbosity
command-line option. - Added an
infinities_compare_equal
parameter toCompareFunc.simple()
as well as corresponding--infinities-compare-equal
command-line flag, to allow matching infinite values in outputs to have an absdiff of 0 for the purpose of comparison.
- Updated the
max_workspace_size
option inCreateConfig
to match the default behavior of TensorRT instead of defaulting to 16 MiB.
- Added a TensorRT
engine_capability
parameter toCreateConfig()
and corresponding--engine-capability
option topolygraphy convert
.
- Updated
CompareFunc.simple()
to cast unsigned and low-precision arrays to signed, higher precisions. This prevents underflows/overflows during difference computations. - Updated
copy_to
andcopy_from
incuda.DeviceView
andcuda.DeviceArray
to no longer implicitly resize the host and device buffers. - Updated
debug reduce
to automatically remove unused graph inputs to create more minimal models. - Updated
func.extend()
to provide a mechanism to accept the input parameters of the extended function as parameters to the decorated function (see the docstring for details).
- Removed various deprecated APIs:
fold_constant
parameter inOnnxFromTfGraph
(removed in the underlying library,tf2onnx
)obey_precision_constraints
inbackend.trt.CreateConfig
(replaced byprecision_constraints
)basic_compare_func
inCompareFunc
(replaced byCompareFunc.simple
)topk_func
inPostprocessFunc
(replaced byPostprocessFunc.top_k
)axis
parameter intop_k
(axis
can be specified as part of thek
parameter)--mode
option ininspect model
(replaced by--show
)
- Added a
preview_features
parameter toCreateConfig
and corresponding--preview-features
argument to CLI tools to enable TensorRT preview features.
- Added support for building refittable engines in Polygraphy.
Setting the
refittable
parameter ofCreateConfig
or passing in the--refittable
flag enables building refittable engines.
- Passing in a nonexistent file to Polygraphy's
--load-timing-cache
option or theload_timing_cache
parameter ofCreateConfig
is no longer a fatal error. Polygraphy will now warn the user and fall back to using an empty timing cache.
- Fixed broken links in documentation.
- Added an
optimization_profile
parameter to the constructor ofTrtRunner
to allow for setting an optimization profile index whenver the runner is activated. - Added an
--optimization-profile
argument to CLI tools to allow for setting the optimization profile to use for inference. - Rewrote examples on comparing frameworks and comparing across runs to provide more detailed use cases and tips.
- Added new examples:
- An example on how to use the
run
subtool's--validate
flag to check for intermediate NaN/infinity outputs, - An example on how to selectively constrain layer precisions in a model using Polygraphy network scripts,
- An example on how to use the
convert
subtool's--fp-to-fp16
flag to convert an ONNX model to fp16.
- An example on how to use the
- Added an extensibility interface for
polygraphy run
along with a new example demonstrating how to write an extension module. - Added
--load-debug-replay
and--save-debug-replay
options to variousdebug
subtools. This allows you to save your progress when debugging and resume from that point later. - Added a how-to guide on working with reduced precision optimizations using Polygraphy.
- The
fold_constant
parameter inOnnxFromTfGraph
has been deprecated since the corresponding parameter intf2onnx
was removed, and will be removed in a future release. - The
explicit_precision
parameter toCreateNetwork
has been deprecated and will be removed in a future release. - Updated Polygraphy wheels to use
entry_points
instead ofscripts
, which should improve cross-platform support. - Updated
debug
subtools to support an interactive mode. When no--check
command is provided, the tools will now interactively prompt you during each iteration to determine whether it passed or failed.
- Removed support for Python 3.5 and older.
- Added a new example for
inspect capability
.
- Updated
polygraphy debug
subtools which use--check
commands to log the duration of each iteration of the command. - Updated the CUDA wrapper to search more paths on Linux for the CUDA runtime library.
- Updated the calibrator to check the data type and shape of data provided by the data loader in cases where input metadata is available.
- Added support for a top-K axis argument in
--postprocess
. - Extended
PostprocessFunc.topk_func()
to support per-output axes.
- Added an
precision_constraints
argument toCreateConfig
and corresponding--precision-constraints
CLI argument. - Added a generic
--postprocess
CLI option which can apply different post-processing functions to different output tensors. - Added an experimental
--compare-func-script
option to allow for custom comparison functions in the CLI. - Added a new
indices
comparison function and corresponding entry to--compare-func
to work with outputs including indices, e.g. class indices in image classification.
- Deprecated
obey_precision_constraints
option inCreateConfig
and corresponding--obey-precision-constraints
CLI argument. - Made
--obey-precision-constraints
,--precision-constraints
, and--strict-types
mutually exclusive options.
- Fixed
CompareFunc.simple()
so thatNaNs
in the output always result in mismatches.
- Added DLA information to build configuration summary.
- Added warnings when the
--load-inputs
or--data-loader-script
options are used together with options only applicable to the default data loader.
- Updated the included
Calibrator
to no longer require manual activation before use with TensorRT.
- Fixed a bug in
debug reduce
where models with multiple branches would not be correctly reduced when using custom input data.
- Updates
inspect model
to show ONNX graph docstrings when--show attrs
is set.
- Fixed a bug in the
TrtLegacyRunner
where attempting to use Polygraphy's calibrator could sometimes result in a crash.
- Added an
ASK_BEFORE_INSTALL
option topolygraphy.config
and corresponding environment variable,POLYGRAPHY_ASK_BEFORE_INSTALL
. This option will cause Polygraphy to prompt before automatically installing any packages. Only relevant ifPOLYGRAPHY_AUTOINSTALL_DEPS
is set. - Added
--load-timing-cache
/--save-timing-cache
options to be more consistent with other options. The--timing-cache
argument will be deprecated in favor of these.
- Updated
EngineBytesFromNetwork
to append to existing timing caches instead of overwriting them Additionally, the write operation is now atomic so that multiple processes can safely write to the same cache. - Updated
CreateConfig
to read from timing caches atomically.
- Fixed a bug in
surgeon insert
where tensors connected to multiple consumers would not be correctly replaced.
- Fixed a bug in
inspect model
where tensors appearing more than once in a node's inputs or outputs would only be displayed once. - Fixed a bug in
inspect model
where ONNX opset would not be displayed correctly for models using multiple opsets. - Fixed an edge case where Polygraphy's lazy importer would not ignore modules that are installed but cannot be imported. Now, modules that cannot be imported for any reason are treated as though they are not available.
- Added a
providers
parameter toSessionFromOnnx
to specify execution providers for ONNX-Runtime and a corresponding--providers
argument to CLI tools.
CompareFunc.simple()
will now correctly display the minimum required tolerances when usingelemwise
mode. Note that in elementwise comparison mode, each element of the output is compared against both tolerances, and only counted as a mismatch if both are exceeded. Hence, the minimum required tolerances apply if only one type of tolerance is being used. When both absolute/relative tolerance are set, the requirements may be lower.
- Added a
memory_pool_limits
parameter toCreateConfig
. - Added a
--pool-limit
/--memory-pool-limit
argument to command-line tools.
- Changed the default base calibrator class to
IInt8EntropyCalibrator2
since it works across both GPU and DLA. To preserve the old behavior, specify--calibration-base-class=IInt8MinMaxCalibrator
on the command-line or specify theBaseClass
argument inCalibrator
in the Python API. - Deprecated
--workspace
command-line option andmax_workspace_size
parameter inCreateConfig
. Use--pool-limit
andmemory_pool_limits
respectively instead.
- Removed deprecated module
polygraphy.util.serde
. Usepolygraphy.json
instead. - Removed
--tactic-replay
command-line option. Use--load-tactics
/--save-tactics
instead.
- Added support for
MeanVarianceNormalization
toPluginRefRunner
.
- Added a
profiling_verbosity
parameter toCreateConfig()
. - Added support for displaying layer-level engine information in
inspect model
for newer versions of TensorRT.
- Added a new
add()
API toRunResults
to make it easier to create custom output data. Added a new example to demonstrate how to use this API.
- Deprecated
--mode
option ininspect model
; a new--show
option has been introduced which can be used to individually control what is displayed. - Command-line tools will now use
IInt8EntropyCalibrator2
for calbration if DLA and int8 mode are enabled since the default does not work with DLA.
- Removed several deprecated submodules of
polygraphy.common
:constants
,cuda
,exception
,func
. These can now be found under the top-levelpolygraphy
module instead.
- Improved the help messages of various subtools, including
run
,debug build
, anddebug reduce
. - Added a default value for
--artifacts-dir
indebug
subtools.
- Fixed a bug in
surgeon insert
where data types of graph output tensors would not be preserved. - Fixed broken links in various READMEs.
- Added an
OnnxFromBytes
loader that can deserialize ONNX models. - Added an
obey_precision_constraints
argument toCreateConfig
and corresponding--obey-precision-constraints
CLI argument.
- Deprecated
strict_types
option inCreateConfig
and corresponding--strict-types
CLI argument.
- Added various examples, a CLI User Guide and directory for how-to guides.
- Added an experimental
template trt-config
tool to generate template scripts that create TensorRT builder configurations. - Added
--hide-fail-output
to makedebug
subtools suppress output from failed iterations. - Added experimental support for DLA.
- Added a
data to-input
tool that can combine inputs/outputs created by--save-inputs
/--save-outputs
. The resulting file is compatible with--load-inputs
.
- Updated
debug
subtools to show captured output on failed iterations. - The logger will now emit all
CRITICAL
messages tostderr
instead ofstdout
. - Renamed
CompareFunc.basic_compare_func
toCompareFunc.simple
. The old name is preserved as an alias. - The
--good
and--bad
arguments indiff-tactics
can now also accept single files instead of directories.
- Fixed a bug where
debug reduce
would crash when ONNX models includedConstant
nodes whose outputs needed to be marked as model outputs.
- Added support for
K
,M
, andG
suffixes to CLI arguments that expect a number of bytes (e.g.--workspace
). These correspond toKiB
,MiB
, andGiB
respectively. For example,--workspace=16M
is equivalent to--workspace=16777216
. - Added a
copy_outputs_to_host
parameter inTrtRunner.infer()
, which, when set toFalse
, will cause the runner to returnDeviceView
s instead of NumPy arrays for inference outputs. This allows us to avoid a device-to-host and host-to-device copy if we want outputs to remain on the device. - Added a
view()
method toDeviceArray
s to create read-onlyDeviceView
s over their data. - Added a
PluginRefRunner
which provides CPU reference implementations for TensorRT plugins and a corresponding--pluginref
runner option inpolygraphy run
.
-
Marked old shape syntax (
<name>,dim0xdim1x...xdimN,<dtype>
) as deprecated since it leads to ambiguity when parsing shapes including named dynamic dimensions.For example, compare:
--input-shapes input0,xxyxz
and:
--input-shapes input0:[x,y,z]
For now, the old syntax continues to work for shapes without named dimensions, but it will be removed in a future version of Polygraphy.
The newer syntax, which was originally introduced in Polygraphy 0.25.0, uses the list syntax already present in other parts of Polygraphy. For example,
--val-range [0,1]
inrun
and--attrs axes=[0,1]
insurgeon insert
use the same syntax. -
Made several performance improvements in the Polygraphy CUDA wrapper.
-
Added a loud warning when the deprecated
--int-min
/--int-max
or--float-min
/--float-max
options are used. These are superseded by--val-range
which allows you to specify data ranges on a per-input basis.
- Removed various deprecated aliases:
ModifyOnnx
,SessionFromOnnxBytes
,ModifyNetwork
,ModifyGraph
- Removed the
to-json
tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON. Polygraphy 0.27.0 and later only support reading and writing data in JSON format. - Removed deprecated legacy submodule
polygraphy.util.misc
which was just an alias forpolygraphy.util
.
- Improved the quality of several examples and added information on how to load serialized TensorRT engines as well as how to use custom input data.
- Added
inspect capability
subtool that will partition a ONNX graph into supported and unsupported subgraphs for usage within TensorRT. - Added Windows support to the CUDA wrapper in
polygraphy/cuda/cuda.py
.
SaveOnnx
will now create parent directories if they do not already exist.
- Fixed a bug where
ExtractSubgraph
would modify the original graph instead of creating a new graph.
- Fixed various typos, added more details to some tool READMEs.
- Added
polygraphy.config
as a top-level import so that it no longer needs to be imported separately (i.e.from polygraphy import config
).
- Fixed a bug where
surgeon sanitize
would not re-run shape inference after overriding model input shapes, causing constant folding to be sub-optimal.
- CLI tools will no longer print long stack traces on user error.
- Fixed a bug where
surgeon
subtools would not work with ONNX models without an.onnx
extension. - Fixed a bug where
surgeon insert
would not correctly run shape inference if the inserted node replaced the graph outputs. - Fixed a bug where
POLYGRAPHY_AUTOINSTALL_DEPS
would not work correctly for nested modules, e.g.mod.lazy_import("onnx.shape_inference")
.
- Added an
--ignore-fail-code
option todebug
subtools to ignore certain types of failures. - Added a highly experimental
OnnxLikeFromNetwork
loader that can generate a file using the ONNX format based on a TensorRT network. The resulting model is not valid ONNX, but is useful for visualization. - Added a
onnx-like-trt-network
type inconvert
to generate ONNX-like models from TensorRT networks. - Added support for custom installation commands during dependency autoinstall.
This can be configured using
config.INSTALL_CMD
or by setting thePOLYGRAPHY_INSTALL_CMD
environment variable. - Added support for loading external data in
InferShapes
. - Added a
--no-per-pass-shape-inference
argument tosurgeon sanitize
to disable shape inference between constant-folding passes. - Added a
--external-data-size-threshold
CLI option for saving external data for ONNX models. - Added a
--no-save-all-tensors-to-one-file
CLI option to avoid saving ONNX external data to a single file.
- Improved logic for auto-permuting tensors in
basic_compare_func
. The new logic can handle an arbitrary number of dimensions. For example, if two tensors with shapes(1, 3, 45, 45, 45)
and(1, 45, 45, 45, 3)
are being compared,basic_compare_func
will now guess that the latter should be transposed using a permutation of(0, 4, 1, 2, 3)
to match the former. - Improved display of
Profile
in logging messages. - Updated NumPy array encoding to use
base64
. In some cases, this can reduce file sizes by a factor of 4. - Updated
debug precision
default direction toforward
as this typically leads to better results. - Added a
--no-strict-types
flag todebug precision
in case strict types needs to be disabled for any reason. FoldConstants
will no longer run shape inference if shape folding is disabled.InferShapes
will now automatically write large models to the disk to work around the 2 GiB protobuf size limitation. The threshold can be configured using thesave_to_disk_threshold_bytes
parameter.
- Fixed a bug in
inspect model
where engine output bindings would all be printed on one line. - Fixed a bug where using
set_profile
in theTrtRunner
would sometimes cause input shape checks ininfer
to fail even when shapes were valid. - Fixed a bug in
inspect model
where engine output bindings would display the wrong shapes for profiles after the first. - Fixed a bug where
debug precision
would incorrectly mark constant layer outputs and non-execution tensors to run in higher precision. - Fixed a bug where
debug precision
would crash if engine building failed. It now continues to the next iteration, counting the previous one as a failure. - Fixed a bug where
InferShapes
would require--external-data-dir
to be set even if the external data were in the same directory as the model. - Fixed a bug where
--data-loader-script
would not provide data in therun
tool if int8 calibration were enabled in TensorRT.
- Added a
--log-file
option to CLI tools to store logging output to a file. - Added an
--iteration-info
argument todebug
subtools so that--check
commands can get information about the current iteration. - Added an experimental
debug repeat
subtool, which is more generic than the existingdebug
subtools.
- Swapping NumPy arrays to the disk is now disabled by default. It can be re-enabled by setting the
POLYGRAPHY_ARRAY_SWAP_THRESHOLD_MB
environment variable.
- Added support for per-output
check_error_stat
, which allows different metrics to be checked for different outputs.
- Moved JSON utilities into a separate
polygraphy.json
submodule. For backwards compatibility, they remain accessible viapolygraphy.util
as well. - The
max
value forcheck_error_stat
inbasic_compare_func
now only checks the maximum absolute/relative tolerances. The previous behavior of checking the values element-wise is preserved in theelemwise
option, which is now the default.
- Fixed a bug where data loader would not cast value ranges provided for
bool
input types, which could lead to generating out-of-bound values.
- Fixed a bug where NumPy arrays smaller than 8 MiB would be serialized to disk unnecessarily.
- Added a
check_error_stat
option inbasic_compare_func
and corresponding--check-error-stat
CLI option to control which statistic (e.g. mean, median, max) of the error is used to determine whether outputs matched.
- A histogram of output/error values will now be displayed at INFO severity on comparison failures. Otherwise, it is displayed at VERBOSE severity.
- Fixed a bug where histogram display would wrap to subsequent lines.
- Added more information about absolute/relative difference in
basic_compare_func
. For example, it will now print a histogram of the distribution of the outputs and errors.
- Added mean absolute/relative error to
OutputCompareResult
, which is returned byComparator.compare_accuracy
. This makes it easier to programmatically access this information.
- Several improvements to the quality of error messages and warnings.
- Fixed a bug where
basic_compare_func
andDataLoader
would issue warnings when default tolerances/value ranges were used.
- Fixed a bug where command-line tools would fail if a
--timing-cache
argument was provided but the file did not exist.
basic_compare_func
will now issue warnings ifatol
/rtol
contain invalid keys.DataLoader
will now issue warnings ifval_range
contains invalid keys.
- Added a
tactic_sources
parameter inCreateConfig
to control TensorRT's tactic sources. - Added a
--tactic-sources
argument to CLI tools. - Added a
DeviceView
class in thecuda
submodule to represent views of GPU memory.DeviceArray
is now a subclass ofDeviceView
. - Added support for accepting
DeviceView
s or device pointers in theCalibrator
. This means that you can now run calibration using data already on the GPU. - Added support for
DeviceView
s inTrtRunner.infer()
. Note thatDeviceView
s cannot be used for input shape-tensors, which must be allocated on the host. - Added support for using
trt.IInt8Calibrator
as theBaseClass
ofCalibrator
. - Exposed some lower level functions like
malloc
,free
, andmemcpy
in the Polygraphy CUDA wrapper. - Added a
set_profile()
method toTrtRunner
to control the active optimization profile. - Added
-q
/--quiet
option to CLI tools. This can be used to suppress logging output without eliminating all output like--silent
does. - Added a
to_trt()
method toProfile
to convert it to a TensorRTIOptimizationProfile
. - Added
--force-fallback-shape-inference
option todebug reduce
. - Added
--fail-regex
option todebug reduce
to distinguish among different types of falures based on command output.
- Changed
TRT_LOGGER
toget_trt_logger()
to make it work properly with lazy imports. - Further improved lazy imports such that no modules are required in order to import Polygraphy modules. Using functionality from Polygraphy modules still requires dependencies.
- Various submodules have been restructured. The old import structure is preserved for backwards compatibility.
- Added
Profile.fill_defaults()
which makes it possible to automatically fill a TensorRT optimization profile with sane default values. - It is now possible to provide TensorRT optimization profile shapes for a subset of the network inputs. In such
cases, the rest of the profile will be populated automatically with
Profile.fill_defaults()
surgeon extract
will no longer run shape inference unless it is required, e.g. ifauto
is specified for one of the shape/data type arguments.- ONNX shape inference will now be skipped when
--force-fallback-shape-inference
is enabled insurgeon extract/sanitize
. debug reduce
will now freeze intermediate shapes in the model if--model-input-shapes
is provided.IterationResult
s now storeLazyArray
rather thannp.ndarray
. The public interface forIterationResult
will automatically pack or unpacknp.ndarray
s into/fromLazyArray
, so the change is completely transparent. This can significantly reduce memory usage for tools likedebug reduce
andrun
.
- Attempting to load a non-existent file will now cause a friendly error message to be displayed rather than crashing.
surgeon sanitize
will no longer override shapes other than those specified in--override-input-shapes
.
- Removed optional
symbol
parameter fromlazy_import
.
- For security reasons, all serialization/deserialization code in Polygraphy has been updated to use JSON
instead of
pickle
. Use the includedto-json
tool to convert data serialized withpickle
to JSON format. - Split
TacticReplayer
into separateTacticRecorder
andTacticReplayer
classes. This provides more fine-grained control over whether to record or replay tactics. - Deprecated
--tactic-replay
in favor of--save-tactics
and--load-tactics
. - Changed
check_finite
parameter inComparator.validate()
tocheck_inf
, since it checks whether values are non-finite rather than the opposite.
- Polygraphy will now validate command-line arguments so that code-injection is not possible.
debug diff-tactics
will now work correctly when replay files are in nested directories.
- Added
--force-fallback-shape-inference
option tosurgeon sanitize
in case ONNX shape inference doesn't work well enough to allow for folding. - Added a
--calibration-base-class
option to allow changing base class for the TensorRT int8 calibrator.
FoldConstants
will no longer fail if a constant folding pass fails. Seterror_ok=False
to disable this behavior.
- Added support for saving/loading ONNX models with externally stored weights.
- Added support for automatically installing dependencies as they are needed. This behavior can be enabled by
setting the
POLYGRAPHY_AUTOINSTALL_DEPS
environment variable to1
. When auto-install is enabled, Polygraphy can also automatically upgrade existing packages if a newer version is requested. - Added
error_ok
option inInferShapes
, which can be set toFalse
to make the loader raise an error when shape inference fails.
val_range
in the DataLoader now falls back to the default range if no range is specified for an input.atol
andrtol
inCompareFunc.basic_compare_func
now fall back to the default tolerance values if no tolerance is specified for an output.- Folding shapes is now optional in
FoldConstants
. surgeon sanitize
now includes a--no-fold-shapes
option to disable shape folding.
- Fixed a bug in
surgeon insert
where input tensors would be disconnected from all their consumers. Previously, in a model with branches, if one entire branch was replaced bysurgeon insert
, the other branch would be invalidated. This is no longer the case. run
will now attempt to avoid introducing a dependency on theonnx
Python package when using an ONNX model when--trt
is the only specified runner.- When
--force-fallback-shape-inference
is set insurgeon extract
, it will now correctly ignore shapes already inferred in the model. - ONNX loaders will no longer make a copy of the model unnecessarily. If a copy is desired, the
copy
parameter can be set toTrue
for loaders that may modify the model. InferShapes
/infer_shapes
will now work with ONNX models larger than 2 GiB if a path to the model is provided instead of anonnx.ModelProto
- Fixed a bug where
FoldConstants
would not count nodes within subgraphs.
- Removed
OnnxTfRunner
and associated CLI options.
- Added
--partitioning
flag tosurgeon sanitize
to control how ONNX-GraphSurgeon partitions the graph during constant folding. - Added
--cleanup
flag tosurgeon sanitize
to remove dead layers in ONNX models.
ExtractSubgraph
loader will now fallback to using shapes/dtypes defined in the model when none are specified.
surgeon sanitize
no longer runs inference when the--override-input-shapes
option is set. Instead, intermediate shapes are cleared.surgeon extract
will no longer override shapes or data types already set in the model when running fallback shape inference.
- Added support for list attributes in
surgeon insert
. - Added
val_range
parameter to data loader, which is more generic thanint_range
/float_range
, which are now deprecated. - Added support for per-input data ranges to
val_range
parameter. - Added
--val-range
CLI option to set input ranges on the command-line. - Added
:
as a valid separator for various options and[dim0,...,dimN]
as valid syntax for shapes. For example, you can now optionally use:instead of:--inputs input0:[3,4]:int64 input1:[4,64,64]:float32
The new and old styles cannot be mixed.--inputs input0,3x4,int64 input1,4x64x64,float32
- Added support for specifying per-output top-k values to CLI tools.
- Added
--trt-config-script
argument, which allows CLI tools to accept scripts that define functions that create TensorRT builder configurations. - Added
--data-loader-script
argument, which allows CLI tools to accept scripts that define data loaders. - Added a new example for the
convert
CLI tool, which shows how to use a custom data loader for int8 calibration on the command-line.
- Fixed a bug where
debug reduce
would remove branches even if they were required to reproduce failures.
- Added support for string input types in
OnnxrtRunner
. - Added
reduce
subtool todebug
which can reduce failing ONNX models to the smallest possible failing subgraph.
- ONNX loaders will no longer modify the original model provided, but instead make a shallow copy.
- Added an example to
dev/
showing how to write new command-line tools.
- Verbose TensorRT network logging will no longer fail to show attributes for layers on older versions of TensorRT.
convert
can now automatically determine the output model type based on the file extension.- Added immediately evaluated functional variants for all loaders exported by Polygraphy.
The variants use the same name as the loaders, except
snake_case
instead ofPascalCase
. See the example for details.
- Polygraphy no longer has
numpy
as an install requirement. Note however that most, but not all, APIs and CLI tools in Polygraphy still do requirenumpy
.
- Removed
func.invoke()
since immediately evaluated functions now supersede it.
- Fixed a bug where some
debug
subtools would write engines to the wrong path.
- Added
FoldConstants
loader for ONNX models. - Added
ExtractSubgraph
loader for ONNX models.
- Moved
--fp-to-fp16
option toconvert
.
- Added
ConvertToFp16
as a separate loader for ONNX models. - Added
InferShapes
loader for ONNX models.
surgeon sanitize
will now run shape inference by default.Modify<X>
loaders have been renamed toModify<X>Outputs
to better reflect their purpose.surgeon sanitize
can now run multiple passes of constant folding to deal with nodes that may not be folded after the first pass (for example,Shape
nodes in cases where ONNX shape inference does not complete).
- Added an experimental
debug
subtool, which includesbuild
anddiff-tactics
(formerly part offlaky
) andprecision
(formerly a separate tool).
flaky diff-tactics
will now only show layers that have potentially bad tactics. To view an entire tactic replay, useinspect tactics
flaky repeat
will now only log commandstderr
output withERROR
severity if the command failed. Otherwise,stderr
output is logged withWARNING
severity.TacticReplayer
can now accept aTacticReplayData
instance directly.TacticReplayData
can now be constructed manually instead of relying on TensorRT types.
flaky
andprecision
tools have been removed and replaced by thedebug
subtool, which includes the functionality of both.
- Added a
POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKS
environment variable to enable internal correctness checks at runtime. By default, these checks are disabled. A failure in such a check typically indicates a bug in Polygraphy. - Added context managers for CUDA helper classes. This helps ensure they are correctly freed.
- Added
sparse_weights
parameter toCreateConfig
, which enables TensorRT optimizations related to sparse weights. - Added a
--sparse-weights
option to various CLI tools.
- Added checks for cases where paths provided to
BytesFromPath
did not exist.
- Added
__enter__
/__exit__
toCalibrator
so that device buffers can be reliably freed after calibration using a context manager. - Added
fp_to_fp16
parameter toModifyOutputs
which will useonnxmltools
to convert float tensors in the model to 16-bit floats. - Added
--fp-to-fp16
CLI argument to various tools. - Added support for
float
,int
, andstr
attributes tosurgeon insert
. - Added
InvokeFromScript
loader, which can import and invoke a function from a Python script. - Added support for loading TensorRT networks from Python scripts to various CLI tools.
CLI tools can now accept a Python script in place of a model file.
The script should define a
load_network
function that takes no arguments and returns a TensorRT builder, network, and optionally parser. See the example for details. - Added an experimental
template
tool that can generate template files.- Added a
trt-network
subtool that can generate a template script for defining TensorRT networks using the network API.
- Added a
- Added a
SaveBytes
loader to facilitate writing bytes to a file between loaders. - Added an experimental
flaky
tool that can help debug flaky failures.- Added
repeat
subtool, which will run a command repeatedly and sort artifacts intogood
andbad
directories. - Added
diff-tactics
subtool, which compares known-good and known-bad tactic replay files to determine which tactics may be the source of error.
- Added
EngineFromNetwork
andCreateConfig
no longer use the global timing cache by default.- Changed
--timing-cache
default in CLI tools toNone
. - Changed
timing_cache
parameter toload_timing_cache
andsave_timing_cache
inCreateConfig
andEngineFromNetwork
respectively. - Runners will now raise errors in
infer
if the provided input data types or shapes do not match expected types and shapes. This behavior can be disabled by settingcheck_inputs=False
. - Changed
--toposort
default to off insurgeon
tools as ONNX models are typically topologically sorted. - The logger will now log messages with
WARNING
or greater severity tosys.stderr
instead ofsys.stdout
- Removed
CNTKRunner
and--cntk
CLI option. - Removed experimental
--network-api
flag in CLI tools. This is superseded by thetemplate trt-network
subtool.
- Fixed memory leaks in
EngineFromNetwork
,EngineFromBytes
, andTrtRunner
.
- Added support for timing caches in
EngineFromNetwork
andCreateConfig
. The former can generate caches, while the latter can load them, resulting in much faster engine builds. By default, Polygraphy will use a global timing cache in the temporary directory. - Added a
--timing-cache
option to various CLI tools. - Added an
EngineBytesFromNetwork
TensorRT loader to provide serialized engines directly. - Added a
BytesFromEngine
TensorRT loader to provide a means of in-memory engine serialization. - Added an experimental
convert
subtool, which can convert models to various other formats. - Added an
algorithm_selector
parameter toCreateConfig
to allow the user to override TensorRT's tactic choices. - Added a
TacticReplayer
algorithm selector to allow for recording and replaying tactics in the TensorRT builder. This makes it possible to make the TensorRT builder behave deterministically. - Added an experimental
--tactic-replay
option to various CLI tools to make it possible to record to and replay from tactic replay files. - Added an experimental
inspect
subtool,tactics
which can display tactic replay files in a human readable format.
- The
surgeon sanitize
subtool can now also modify model outputs. surgeon insert
will now preserve graph input and output names.
- Fixed a bug where the CUDA wrapper could not allocate buffers larger than 3GiB.
TrtRunner
can now optionally accept acontext
directly instead of anengine
.basic_compare_func
will now show mismatched indices in addition to mismatched values.
- Added an experimental
surgeon
subtool,insert
, which can insert new nodes into an ONNX model. - Added an experimental
surgeon
subtool,sanitize
, which can remove unused nodes and fold constants in an ONNX model. - Added
--load-inputs
and--save-inputs
to provide a mechanism to supply custom input data on the command line. - Added
func.invoke()
, a function that calls a provided callable. This can be useful to make it more obvious that a loader is being immediately evaluated. For example:EngineFromNetwork(...)()
vs.func.invoke(EngineFromNetwork(...))
- Added per-output tolerance support in
basic_compare_func
. - Added per-output tolerance support to the
--atol
and--rtol
command-line options.
- Renamed
inspect results
toinspect data
since it can now also be used to inspect input data, not just results. Comparator.compare_accuracy
now supports comparing a single runner against itself.
- Removed experimental surgeon subtool
prepare
andoperate
as they were difficult to maintain and not very useful.
- Fixed a memory leak due to
IBuilderConfig
not being properly freed in theEngineFromNetwork
loader. - Fixed memory leaks on exceptions in TensorRT loaders.
- Fixed a bug in
inspect model
wheredim_param
s in ONNX models would show up as-1
.
- Shape values in
TensorMetadata
can now be strings to indicate dynamic dimensions. TRT_LOGGER
is now exported underpolygraphy.backend.trt
- Fixed a bug in
surgeon extract
where ONNX models usingdim_param
would be rejected.
- Added missing copyright headers
- Added an
--input-shapes
alias for the--inputs
option inrun
to better reflect its purpose.
inspect model
will no longer showdtype
/shape
asNone
if the information is not present in the model. Instead, these are now omitted.
- Fixed a bug where boolean outputs would cause a crash in
basic_compare_func
- Fixed a bug where
TrtRunner
would use the wrong shapes for empty tensor outputs .
- Fixed a bug where the
Calibrator
would not re-check the cache whenreset()
- Added
-v
/--version
flag topolygraphy
- Cleaned up unnecessary logging output, and fixed formatting.
- Added new modes to
inspect model
, to control whether to show weights in the model. - Added
-s
/--show-values
option toinspect results
to display output values. - Added an experimental
--top-k
flag torun
, which will apply a Top-K before comparing outputs. - Added
exclude_outputs
toModifyOutputs
andModifyNetworkOutputs
- Added an experimental
--onnx-exclude-outputs
and--trt-exclude-outputs
to selectively unmark outputs.
- Fixed a bug in
inspect model
for ONNX models containing nodes with Tensor attributes. - Fixed a bug where
DeviceArray.copy_from
would segfault in rare cases.
- General cleanup and addition of missing docstrings.
- Fixed a bug where
DataLoader
would use a shape provided by the user even for static shapes in the model. - Fixed a bug where
DataLoader
would incorrectly report certain tensors as shape tensors. - Fixed a bug where the
DataLoaderCache
would stop checking the cache after the first miss.
- Added an
extend
decorator, which makes it easier to extend existing loaders. - Added more API examples.
Comparator.compare_accuracy
will now display an accuracy summary after processing all iterations.- Added a
CreateNetwork
loader to create new TensorRT networks - Added experimental
--network-api
option that works with--gen
to allow manually defining a TensorRT network.
Calibrator
can now accept a file-like object forcache
instead of just a file path.
- Fixed various errors in API documentation.
EngineFromBytes
will now calltrt.init_libnvinfer_plugins
before attempting to deserialize the engine.
- Added HTML docs for the Python API
- Fixed a bug where the data loader would not support cases where
int_min
==int_max
when bounding input data - Fixed a bug where OnnxrtRunner would report incorrect metadata for ONNX models using
dim_param
for dynamic dimensions.
CreateConfig
now accepts astrict_types
argument.- Added a new
polygraphy
binary, which includes several tools - Added an experimental new tool:
precision
, which can be used to figure out what layers to run in higher precision in TensorRT to achieve the desired accuracy.- Added
bisect
subtool that does binary search - Added
linear
subtool that does a linear search - Added
worst-first
subtool that marks the layers that introduce the most error first.
- Added
- Added a new tool:
inspect
to inspect supported files- Added
model
which displays information about models. - Added
results
which displays information about savedRunResults
- Added
- Added back
subprocess_polling_interval
toComparator.run()
, as this is still required in certain rare cases. - Optimization passes are now optional in
OnnxFromTfGraph
, and can be disabled by settingoptimize=False
in the constructor. - Runners now include an
is_active
property, which indicates whether the runner is currently activated. - Added an experimental new tool:
surgeon
, which can be used to modify ONNX models more easily than using ONNX-GraphSurgeon.- Added
prepare
andoperate
which can be used to modify an ONNX model using a JSON configuration. - Added
extract
which can extract ONNX subgraphs with a single command.
- Added
- Added
--onnx-outputs
and--trt-outputs
to set outputs in the corresponding loaders - Added a passthrough loader,
LoadPlugins
, that can wrap any other loader, and load plugins
EngineFromNetwork
will no longer free the builder, network and parser if they are provided directly (as opposed to via a loader).TrtRunner
will no longer free the engine if it is provided directly (as opposed to via a loader).- All file saving arguments now take file paths instead of directories. This makes it easier to know exactly where each file is being written.
compare_func
inComparator.compare_accuracy
now accepts a function that returns anything convertible to a boolean, rather than requiring a boolean.basic_compare_func
now will return information about required tolerances afterComparator.compare_accuracy
.Calibrator
can now be configured to inherit from a different TensorRT calibrator base class.- ONNX GraphSurgeon is no longer required to mark outputs in ONNX models.
TrtLegacyRunner
no longer depends onpycuda
TrtRunner
will now only reset context shapes if the shapes changed. This should improve performance.DataLoader
now takesint_range
andfloat_range
parameters, so min/max can be provided more concisely.- All
Loaders
andRunner
were renamed to better reflect their purpose, and to improve readability. - Renamed
warm_up_runs
towarm_up
. Calibrator
'sdata_loader
parameter now accepts any generator or iterable instead of requiring a special type.Comparator.run
'sdata_loader
parameter now accepts any generator or iterable instead of requiring a special type.- The included
DataLoader
can now be used as an iterable, and its iteration length can be controlled via theiterations
parameter. - Renamed
--input-shape
to--inputs
- Renamed
--min-shape
/--opt-shape
/--max-shape
to--trt-min-shapes
/--trt-opt-shapes
/--trt-max-shapes
DataLoader
now accepts aninput_metadata
parameter which can be used to override shapes and data types.- Split off
layerwise
andoutputs
functionality into separateModify
loaders. - Split off artifact saving functionality into separate
Save
loaders. - Renamed
--read
options to--load
, and--write
to--save
- Renamed
--read-outputs
/--write-outputs
to--load-results
/--save-results
Calibrator
no longer requiresinput_metadata
to be set if the data loader does not need itTfRunner
now uses aCreateConfig
loader to supply configuration.TfRunner
andOnnxrtRunner
now take aBuildSession
, so that custom sessions can be used.
- Removed iteration arguments from
Comparator.run()
andCalibrator
. Instead these now iterate the provided data loader until it runs out of data. - Removed
--load-engine
option frompolygraphy
. Engines can now be provided as models directly, e.g.polygraphy run example.engine --trt
polygraphy_exec
andpolygraphy_gen
were removed. They are superseded by therun
subtool ofpolygraphy
.--layerwise
andlayerwise
options have been removed. Layerwise behavior is now possible withoutputs=constants.MARK_ALL
or--<framework>-outputs mark all
- Fixed bugs in
Comparator.validate
that would cause it not to correctly display non-finite values. Calibrator
will now warn if a cache exists but is emptyDataLoader
will now used a fixed seed value unless otherwise specified. This ensures consistent run-to-run behavior.- The default
find_output_func
will no longer compare outputs whose names don't match if there is another output that does match. - Fixed a bug where custom names provided to runners would still be suffixed with a timestamp.
- Fixed a bug where regular TensorRT calibrators could not be used with
CreateConfig
- The missing subtool warning will no longer be displayed if that subtool is not being used.
basic_compare_func
now accepts afind_output_func
parameter, allowing users to control which outputs are compared between results.- The
--load-outputs
argument can now accept multiple different files. Outputs from each of these will be read in order. - Added an implicit batch ONNX network loader for the legacy TensorRT runner. This will not work with recent versions of the parser.
- Added
RunResults
class which replaces theOrderedDict
thatComparator.run
previously returned (structure is unchanged).
layerwise
mode will no longer mark constants as outputs.- The default
compare_func
inComparator.compare_accuracy
will now always iterate over the output names in the firstIterationResult
and attempt to find them in the second. The order of theIterationResult
s provided to this function can be modified either by settingcomparisons
inComparator.compare_accuracy
, or changing the order of runners inComparator.run
- Improves
polygraphy_gen
output formatting - Renamed
RunResult
toIterationResult
to better reflect its purpose. - Default runner names now include timestamps to disambiguate when saving and loading multiple runners.
graphsurgeon
is no longer a dependency of Polygraphy
- Logger settings in
polygraphy_exec
/polygraphy_gen
are now set prior to any logging output. - Comparator will no longer attempt to decompress all
bytes
objects sent over the queue when using subprocesses
- Added
OnnxExtWeightsNetworkLoader
to support loading ONNX models with externally stored weights into TensorRT. - Added a
TensorMetadata
class to replace dictionaries that were used across Polygraphy. - Added
CaffeNetworkLoader
for theTrtLegacyRunner
polygraphy_exec
andpolygraphy_gen
will no longer use subprocesses by default. To revert to the old behavior, the--use-subprocess
flag must now be explicitly provided.SerializedEngineLoader
now accepts abuffer_loader
, so that a function that loads a serialized engine may be provided instead of the serialized engine itself.- Default opset for
OnnxFromTfGraph
has been updated to11
polygraphy_exec
andpolygraphy_gen
now correctly handle cases where no model file is provided
- Added a
PolygraphyException
class to serve as a base class for exceptions raised by Polygraphy.
ConfigLoader
now accepts a list ofProfile
s to support multiple optimization profiles.- Changed the format of CLI shapes arguments to enable specifying multiple profiles.
- Moves
outputs
argument from TfRunner to the tensorflow loaders.
- Polygraphy now includes a thin
ctypes
wrapper around the CUDA runtime library, accessible inutil/cuda.py
TrtRunner
no longer depends onpycuda
, and instead uses the included CUDA wrapper.- Loader parameters may now be loaders themselves, or the result of invoking a loader.
- Improves the quality of Comparator messages when there are mismatches
basic_compare_func
will now preserve output ordering in the results.- Makes
EngineFromNetwork
compatible with TensorRT 7.0
- Restructures ONNX Runner, and adds layerwise functionality (using ONNX-GraphSurgeon).
- Added
--timestamp
and--line-info
options topolygraphy_exec
to enable logging of timestamp and line numbers respectively. - Added
--no-letter
option to disable severity letter prefixes in log messages - Added
register_callback
to Logger, which registers a callback that will be called whenever the severity changes. - Added
Logger.verbosity()
which returns a context manager that can be used to temporarily change logging severity. - Added new variants to
--model-type
inpolygraphy_exec
:keras
,ckpt
, renamedtf
tofrozen
- Added
ConfigLoader
which can be passed toEngineFromNetwork
to customize the build configuration prior to building.
- The logger no longer displays timestamps and line numbers. These can be enabled by setting the
timestamp
/line_info
properties respectively toTrue
. - Logger now relies on the
colored
module to provide colored output polygraphy_exec
now runs runners in the order in which they were specified.- Greatly shortens import paths by removing
_runner
suffixes and shortening framework names (e.g.tensorflow_runner
->tf
) runners
submodule has been renamed tobackend
TrtRunner
has been renamed toTrtLegacyRunner
TrtRunnerV2
has been renamed toTrtRunner
polygraphy_gen
is now at parity withpolygraphy_exec
- Removed
--tftrt
as a separate runner inpolygraphy_exec
- instead it is now an option for the--tf
runner. - Removed
--tftrt-gpu-memory-fraction
and renamed--tf-gpu-memory-fraction
to--gpu-memory-fraction
inpolygraphy_exec
- Removed
--tfonnx
, and instead adds this functionality in--onnxrt
when using a TensorFlow model inpolygraphy_exec
- Removed
Experimental
argument section inpolygraphy_exec
. All functionality has now been integrated into non-experimental arguments. - Removed
preprocess_network
argument fromEngineFromNetwork
. This functionality can be achieved by wrapping the network loaders instead.
Comparator.run
will now forcefully terminate the subprocess if it does not exit on its own.
- Added TF32 support to legacy TrtLegacyRunner.
- Various improvements to automatic shape matching for cases where shapes between runners do not match exactly.
- Changed
BaseRunner
so that runners can now implementactivate()
/deactivate()
instead of__enter__()
/__exit__()
polygraphy_exec
now defaults to running just a single iteration of inference.
- The
--accuracy
flag has been removed frompolygraphy_exec
, as this is now the default behavior.
- TensorRT runners now use the same builder to build the network and engine, instead of using a separate builder for each.
- Fixes a bug in
try_match_shape
- Added a
tf32
parameter as well as--tf32
flag for TensorRT. - Added support for
dim_param
in ONNX.
fp16_mode
andint8_mode
parameters have been renamed tofp16
andint8
respectively.polygraphy_exec
will now use the runtime shapes specified rather than always usingOPT
shapes from the TensorRT profile.- Improves shape matching logic in
DataLoaderCache
- Added a
start_index
andend_index
toComparator.run
to make it easy to skip over inputs from the data loader. - Added
CompareFunc
to provide built-in comparison functions. - Added
PostprocessFunc
to provide built-in post-processing functions. Comparator.compare_accuracy
now returns anAccuracyResult
object, which contains much more information about the results of the comparisons.- Added
percentage()
function toAccuracyResult
to provide an easy way to figure out the percentage of passed iterations.
- Replaces
RunInfo
withIterationResult
. The latter only stores information about a single iteration for a single runner. compare_func
inComparator.compare_accuracy
is now aCallable(IterationResult, IterationResult) -> Dict[str, bool]
warm_up_runs
now defaults to0
, andend_index
to1
- Ordering of outputs in a single iteration is now preserved in
CompareFunc.basic_compare_func
use_subprocess
now defaults toFalse
inComparator.run()
(still defaults toTrue
inpolygraphy_exec
).Calibrator
now takes astart_index
andend_index
argument instead ofmax_items
.
- Removed
Comparator.compare
function sinceComparator.compare_accuracy
includes all of its functionality. iterations
inComparator.run
has been removed and replaced bystart_index
andend_index
- Removed
subprocess_polling_interval
argument, asComparator
can now properly detect when the subprocess terminates.
Comparator.run()
will no longer hang if there is a segfault in the subprocess.
- Added
--int-min
,--int-max
,--float-min
, and--float-max
arguments topolygraphy_exec
- Added
--explicit-precision
option topolygraphy_exec
to enable QAT models in TRT. - Added empty tensor support. Empty tensors are tensors whose shapes contain one or more 0s.
- When
--load-outputs
or--save-outputs
is specified topolygraphy_exec
,seed
will default to1
to ensure consistent inputs across runs.
- Added a
--calibration-cache
option topolygraphy_exec
to enable supplying a calibration cache - Added a
--no-color
option to disable color logging.
- Added
GraphOptimizerLoader
for freezing TensorFlow graphs and--freeze-graph
option topolygraphy_exec
. - Added
--load-outputs
and--save-outputs
topolygraphy_exec
for comparing across executions. - Added
KerasLoader
for loading models stored inhdf5
format. - Added constant folding pass to
GraphOptimizerLoader
for TensorFlow graphs.
- Updates
Calibrator
so that it will now use the opt dimension of a profile for networks with dynamic shapes. - Updates Legacy TensorRT runner to use
Loaders
for easier UFF debugging.
Calibrator
will no longer allocate buffers if a calibration cache was provided.
- Added generation of ONNX code to
polygraphy_gen
- Added default implementations of some
BaseRunner
methods. - Added
last_inference_time()
toBaseRunner
so thatinfer()
now only needs to return outputs. - Added
Calibrator
for int8 calibration, along with additional parameters toEngineFromNetwork
- Better warnings for user-defined implementations of various APIs.
DataLoaderCache
will now warn loudly when a set of inputs needs to be regenerated.- Cleans up
Comparator
run()
function. - Moves most
save_*
options into loaders rather than runners. - Changed
BaseDataLoader.next()
to take index as an argument. This way, inputs can be reliably repeated across runners. - Moves all
layerwise
parameters into loaders rather than runners. Loader
s are now interchangeable with PythonCallable
sDataLoader
s are now interchangeable with PythonCallable
s
DataLoader
no longer generates allTrue
values for boolean types.- Various bug fixes in
polygraphy_gen
DataLoaderCache
is now sent over the queue when runners are run in subprocesses. This resolves an issue where the cache was not being updated correctly.Comparator
now updates runners correctly when using a subprocess.
- Added
--no-fold-constant
option to preventOnnxFromTfGraph
from doing constant folding in the TensorFlow graph. - Added experimental
polygraphy_gen
script that enables generation of template Python scripts for running Polygraphy.
- Bug fix for cases where TensorFlow nodes with no outputs are recognized as graph outputs by
GraphSurgeon
.
- Added
name
parameter toCheckpointLoader
in case the checkpoint does not include acheckpoint
file.
TFTRTLoader
now accepts any kind of TensorFlow Graph loader
- Bug fix in
TrtRunner
Buffers
so that no-op reshapes (no reallocation) are handled correctly.
- Added
check_inf
,check_nan
, andfail_fast
options toComparator.validate()
- Cleans up
Buffers
implementation forTrtRunner
- eliminates an unnecessary copy that was happening on the host input. - Improved logic for matching output names in
util.find_in_dict()
TrtRunner
will no longer callcontext
's shape setting functions on non-dynamic inputs.
- Bug fix for volume computation for scalars.
- Updates
DataLoader
to handle scalars correctly, adds several tests.
- Added various utility functions as static members of
TrtRunner
, e.g.create_network
function to simplify TensorRT's network flags.
EngineFromNetwork
will now mark network outputs whenlayerwise=True
- Added support for
bool
outputs inComparator
- Replaces
OnnxEngineLoader
withOnnxNetworkLoader
andEngineFromNetwork
. This allows for more flexibility in building engines from TensorRT networks.
- Added
allow_growth
option to TfRunner to work aroundCUDNN_STATUS_INTERNAL_ERROR
. Whenallow_growth
is enabled, the error disappears.
DataLoaderCache
will now attempt to permute inputs in cases where shapes do not match exactly (e.g. NCHW vs NHWC inputs).
- Fixes a bug in
polygraphy_exec
which caused it to ignore user-defined profiles.
- Added support for many more ONNX data types.
- Added support for
int8
and explicit precision mode inTrtRunner
- Added
preprocess_network
parameter toOnnxEngineLoader
so that the network can be modified before it is used for building.
TrtRunner
will now attempt to generate sane default shapes in cases with dynamic shapes where no profiles are provided.
DataLoader
no longer overrides static shapes in the model, but issues a warning if an override is requested.DataLoader
now accepts shape tensor inputs in itsdefault_shapes
parameter.
- Added timestamps to logging output.
Comparator
can now catch segfaults in runners properly.
- Added options for
DataLoader
to be able to specify input bounds - Added smarter matching for input metadata in the
DataLoaderCache
- Default
subprocess_polling_interval
is now 30 seconds. Comparator
now attempts to partially match output names when no exact matches are found.
- Added
subprocess_timeout
parameter toComparator.run
to prevent hangs when a subprocess does not terminate. - Added
subprocess_polling_interval
parameter to allow the process to be polled so that failing processes can be terminated before the fullsubprocess_timeout
.
- If ONNX checker fails due to the IR version of the model being too new, Polygraphy now ignores the error and continues.
OnnxEngineLoader
now accepts anonnx_loader
for better flexibility in loading models.polygraphy_exec
now supports running TF models in TRT via the tf2onnx converter.- Legacy
TrtLegacyRunner
now only supports UFF models.
- Added
BaseModelLoader
that can be used to load models. This allows for reuse of existing runners with different import paths. For example,OnnxrtRunner
can be used withOnnxFromTfGraph
in order to run a TensorFlow frozen graph via ONNX-Runtime. - Implements
ModelLoader
s forTfRunner
, including a frozen model loader, checkpoint loader, and TF-TRT loader.
OnnxFromTfGraph
now accepts a TensorFlow ModelLoader to support a wider variety of input formats.- Updates legacy
TrtLegacyRunner
to useget_input_metadata
API, so it is usable for UFF models.
- Comparator will now look at the union of all outputs from all runners when checking for common outputs.
TrtRunner
will no longer mark layers within the loop body as network outputs inlayerwise
mode.DataLoaderCache
now falls back to reusing inputs based on order if names do not match exactly.DataLoader
now accepts adefault_shapes
parameter to override dynamic shapes.
- Added
get_input_metadata
API to BaseRunner. Overhauls runners so they no longer need to handle dynamic input shapes individually. - Added
DataLoader
class which can be used to feed data to the Comparator. - Added
DataLoaderCache
so that the data loader does not have to load inputs multiple times for each runner.
Comparator.compare_accuracy
now fails if no outputs were compared.
- Removed support for implicit batch ONNX models in
TrtLegacyRunner
. You should useTrtRunner
for ONNX models instead.
- Removed
python2
support.
- Bug fixes for TensorFlow Graphs
- Bug fixes for
polygraphy_exec
when using legacyTrtLegacyRunner
- Bug fixes for
TrtRunner
for cases with multiple outputs
- Added support for compression during communication between the runner subprocesses and the main
Comparator
process. This is becausePipe
s andQueue
s can only send objects smaller than 2GB. - Added timeouts to reduce the possibility of hangs in runners.
- Added
--fail-fast
option topolygraphy_exec
and correspondingfail_fast
option toComparator.compare()
. Useful for determining the first layer at which two models diverge. - Added
TrtRunner
that can be used to run TRT networks with dynamic shapes. Currently only supports ONNX.
- Runners no longer need to specify inputs up front - they can now be specified after
__enter__
is called. This greatly simplifies much of the logic in several runners. RunInfo
no longer contains data about the inputs used.TFOnnxrtRunner
now accepts an opset option when converting graphs to ONNX.
- All runner files are now suffixed with
_runner
to disambiguate them from system packages. - Fixes an issue that prevent EXTRA_VERBOSE logging output from TRT from being displayed.
- Added a
--uff-order
option in case the automatically determined order is wrong. - Added an experimental
--build-only
option topolygraphy_exec
- Comparator will now attempt to permute outputs with mismatched shapes when
check_shapes
is disabled. - Lowers the default GPU memory fraction, as TensorFlow has OOM issues when it is set too high.
- Added
TFOnnxrtRunner
and--tfonnx
option topolygraphy_exec
- Added
OnnxrtRunner
and movesTFOnnxrtRunner
intoonnx_runner.py
. - Added
--save-onnx
option forOnnxrtRunner
- Changed
--onnx
polygraphy_exec
option toonnxtf
to disambiguate from--onnxrt
- Added
CNTKRunner
and--cntk
option topolygraphy_exec
- Changed default shape value to 1. This is the value that is set when no input dimension is specified.
- Added support for loading TF checkpoints.
- Added support for overriding automatically determined outputs in the TF and TF-TRT runners. Added
--tf-outputs
argument topolygraphy_exec
- Fixes input shape mismatches between ONNX-RT and TF.
- Added
--plugins
option topolygraphy_exec
for loading TRT plugins.
- Added a function in comparator to perform output validation, and a corresponding flag in
polygraphy_exec
. - Runners now use OrderedDict for outputs, meaning that the ordering of the outputs will match the order of the layers in the network in most cases.
- Improved TensorFlow output tensor deduction by excluding certain ops that cannot behave like outputs in TensorFlow.
- Version information is now logged at INFO logging severity.
- Removed prepare_inputs/prepare_outputs functions. Instead, runners now do timing on their own in the infer function.
- Changed runner inputs to use dictionaries that map input names to their numpy buffers.
polygraphy_exec
will no longer fail if the extension for the model file is unrecognized.- Added
fp16_mode
option to TfRunner for TF-TRT.
- Added an option to limit TensorFlow GPU memory usage
- Added an option to specify minimum segment size to TF-TRT.
- Added an option to write out engine(s) from the TF-TRT graph.
polygraphy_exec
now exits when unknown arguments are encountered- Improves timestamps to be human-readable instead of using seconds from epoch.
- Added support for dynamic ops in TF-TRT
- Added an option to write out tensorboard visualizations.
- Added an option for enabling XLA in the TensorFlow runner.
- Added nicer error messages on failed TF-TRT imports
- If a TensorFlow graph specifies a dynamic shape, Polygraphy now automatically populates it with concrete values.
- Added argument groups and moves some unstable arguments to Experimental section.
- Polygraphy will now refuse to write artifacts to the disk if a file already exists wherever it can detect such cases.
polygraphy_exec
now emits warnings when unknown command line parameters are used.- Added capability to write out TensorFlow timelines.
- Changed --save* options to accept directory names instead, and the resulting files are timestamped and named based on the runner name.
- Changed command-line parameters to use dashes instead of underscore.
- Modifies TrtLegacyRunner to pass along input order to UFF, instead of permuting the order to CHW.
- Comparator now prints runner output in the same order in which they were specified.
- Added per-inference-inputs command-line arguments for running multiple comparisons.
- Seed is now displayed correctly during Comparator.run().
- User-friendly Comparator output - now suggests command-line flags to get what you were looking for.
- Added layerwise comparison support for TrtLegacyRunner and TfRunner.
- Renamed to TRT Polygraphy.
- Overhauled README.md
- Modified project structure - created runners, comparator, and logger submodules.
- polygraphy_exec now uses batch size specified by model if none is specified by the user.
- Added framework dependencies to setup.py
- TrtLegacyRunner now displays ONNX parsing errors and exits early on parsing failures.
- Initial integration