after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

jinhonglu · 2025-01-28T09:37:49Z

Description

I tried to follow the int8 custom calibration to build my int8 engine from onnx fp32 model.
https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt

After building the engine, I used the following to inspect the layer
polygraphy inspect model int8.engine --model-type engine --show layers
However, all the layers still use fp32

Moreover, I tried the debug precision function to investigate the layers' differences to build a mixed-precision engine, the result shows the same and the inference time gets much slower than the onnx fp32 model.
CUDA_VISIBLE_DEVICES=3 polygraphy debug precision fp32_model.onnx --int8 --tactic-sources cublas --verbose -p float32 --calibration-cache int8_calib.cache --check polygraphy run polygraphy_debug.engine --trt --load-inputs golden_input.json --load-outputs golden.json --abs 1e-2

Environment

TensorRT Version: 10.4

NVIDIA GPU: A100

NVIDIA Driver Version: 12.5

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

The text was updated successfully, but these errors were encountered:

kevinch-nv · 2025-02-10T22:33:19Z

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

jinhonglu · 2025-02-11T01:55:11Z

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

The model is quite big, it is unable to upload here. any other way to share with you?

the output of the int8 engine is same as the fp32 onnx model, but I just wondering why the layers of int8 engine still show fp32 after using polygraphy to inspect

and the inference time gets triple higher than the fp32 onnx model

kevinch-nv self-assigned this Feb 10, 2025

kevinch-nv added triaged Issue has been triaged by maintainers Module:Polygraphy Issues with Polygraphy waiting for feedback Requires more information from user to make progress on the issue. labels Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

jinhonglu commented Jan 28, 2025 •

edited

Loading

kevinch-nv commented Feb 10, 2025

jinhonglu commented Feb 11, 2025 •

edited

Loading

after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

Comments

jinhonglu commented Jan 28, 2025 • edited Loading

Description

Environment

Relevant Files

Steps To Reproduce

kevinch-nv commented Feb 10, 2025

jinhonglu commented Feb 11, 2025 • edited Loading

jinhonglu commented Jan 28, 2025 •

edited

Loading

jinhonglu commented Feb 11, 2025 •

edited

Loading