Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341

Open
jinhonglu opened this issue Jan 28, 2025 · 2 comments
Assignees
Labels
Module:Polygraphy Issues with Polygraphy triaged Issue has been triaged by maintainers waiting for feedback Requires more information from user to make progress on the issue.

Comments

@jinhonglu
Copy link

jinhonglu commented Jan 28, 2025

Description

I tried to follow the int8 custom calibration to build my int8 engine from onnx fp32 model.
https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt

After building the engine, I used the following to inspect the layer
polygraphy inspect model int8.engine --model-type engine --show layers
However, all the layers still use fp32

Moreover, I tried the debug precision function to investigate the layers' differences to build a mixed-precision engine, the result shows the same and the inference time gets much slower than the onnx fp32 model.
CUDA_VISIBLE_DEVICES=3 polygraphy debug precision fp32_model.onnx --int8 --tactic-sources cublas --verbose -p float32 --calibration-cache int8_calib.cache --check polygraphy run polygraphy_debug.engine --trt --load-inputs golden_input.json --load-outputs golden.json --abs 1e-2

Environment

TensorRT Version: 10.4

NVIDIA GPU: A100

NVIDIA Driver Version: 12.5

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@kevinch-nv kevinch-nv self-assigned this Feb 10, 2025
@kevinch-nv kevinch-nv added triaged Issue has been triaged by maintainers Module:Polygraphy Issues with Polygraphy waiting for feedback Requires more information from user to make progress on the issue. labels Feb 10, 2025
@kevinch-nv
Copy link
Collaborator

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

@jinhonglu
Copy link
Author

jinhonglu commented Feb 11, 2025

Can you share your model? Testing this workflow with https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v1-12.onnx the output is as expected.

The model is quite big, it is unable to upload here. any other way to share with you?

the output of the int8 engine is same as the fp32 onnx model, but I just wondering why the layers of int8 engine still show fp32 after using polygraphy to inspect

and the inference time gets triple higher than the fp32 onnx model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module:Polygraphy Issues with Polygraphy triaged Issue has been triaged by maintainers waiting for feedback Requires more information from user to make progress on the issue.
Projects
None yet
Development

No branches or pull requests

2 participants