after converting onnx fp32 to int8 engine with custom calibration, the engine layers still show fp32 #4341
Labels
Module:Polygraphy
Issues with Polygraphy
triaged
Issue has been triaged by maintainers
waiting for feedback
Requires more information from user to make progress on the issue.
Description
I tried to follow the int8 custom calibration to build my int8 engine from onnx fp32 model.
https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/convert/01_int8_calibration_in_tensorrt
After building the engine, I used the following to inspect the layer
polygraphy inspect model int8.engine --model-type engine --show layers
However, all the layers still use fp32
Moreover, I tried the debug precision function to investigate the layers' differences to build a mixed-precision engine, the result shows the same and the inference time gets much slower than the onnx fp32 model.
CUDA_VISIBLE_DEVICES=3 polygraphy debug precision fp32_model.onnx --int8 --tactic-sources cublas --verbose -p float32 --calibration-cache int8_calib.cache --check polygraphy run polygraphy_debug.engine --trt --load-inputs golden_input.json --load-outputs golden.json --abs 1e-2
Environment
TensorRT Version: 10.4
NVIDIA GPU: A100
NVIDIA Driver Version: 12.5
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: