demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

praveenperfecto · 2025-02-10T06:33:33Z

CUDA_VISIBLE_DEVICES=0,1 python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --batch-size=1 --use-cuda-graph
/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:25: UserWarning: Failed to import apex plugin due to: ImportError("cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)")
warnings.warn(f"Failed to import {plugin_name} plugin due to: {repr(e)}")
[I] Initializing StableDiffusion img2vid demo using TensorRT
[I] Autoselected scheduler: Euler
[I] Load Scheduler EulerDiscreteScheduler from: pytorch_model/svd-xt-1.1/IMG2VID/eulerdiscretescheduler/scheduler
Building TensorRT engine for onnx-svd-xt-1-1/unet-temp.opt/model.onnx: engine-svd-xt-1-1/unet-temp.trt10.7.0.post1.plan
Strongly typed mode is False for onnx-svd-xt-1-1/unet-temp.opt/model.onnx
/usr/local/lib/python3.12/dist-packages/polygraphy/backend/trt/util.py:590: DeprecationWarning: Use Deprecated in TensorRT 10.1. Superseded by explicit quantization. instead.
calibrator = config.int8_calibrator
[E] [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:118] autotuning: User allocator error allocating 86114304000-byte buffer
[E] [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:118] autotuning: User allocator error allocating 86114304000-byte buffer

kevinch-nv · 2025-02-10T22:01:24Z

What GPU are you using? Most likely your GPU doesn't have enough VRAM required to run this demo.

praveenperfecto · 2025-02-11T04:48:24Z

@kevinch-nv : My system has an NVIDIA H100 GPU with 80GB of HBM3 memory, but encountering an out-of-memory (OOM) error while building the TensorRT engine for onnx-svd-xt-1-1/unet-temp.opt/model.onnx.

Error Code 1: Cuda Runtime (out of memory), Excessive memory request during TensorRT engine building
The error shows an allocation request of 86114304000 bytes (~86GB), which exceeds 80GB GPU memory.

After explicitly passing the parameters --batch-size=1 --use-cuda-graph --height 256 --width 512
then I was able to build engine and do the inference.

|-----------------|--------------|

Module	Latency
VAE-Enc	8.72 ms
CLIP	17.98 ms
UNet x 25	2384.03 ms
VAE-Dec	520.62 ms
-----------------	--------------
Pipeline	2936.22 ms
-----------------	--------------
Throughput: 20.43 videos/min (25 frames)
Saving video to: img2vid-fp16-None-2611-trt.gif,

This is the command
CUDA_VISIBLE_DEVICES=0,1 python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --batch-size=1 --use-cuda-graph

where I have two GPUS, for multi-GPU execution
Does demo_img2vid.py explicitly distribute workloads across multiple GPUs?
If using TensorRT, does it enable multi-GPU execution explicitly?

praveenperfecto · 2025-02-13T04:08:25Z

@kevinch-nv, HI Kevinch, I wanted to follow up on the CUDA out-of-memory (OOM) issue encountered while building the TensorRT engine for svd-xt-1.1 and if it's possible to deploy the SVD-XT-1.1 model using Triton Inference Server, given the current setup and TensorRT engine files.

praveenperfecto · 2025-02-14T04:54:31Z

Hi @kevinch-nv,

I tried with FP8: python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --fp8

[E] Error Code: 9: Skipping tactic 0x0000000000000001 due to exception Unsupported data type FP8.
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception Assertion idx < kNB_PACKED_KERNELS failed.
[E] Error Code: 9: Skipping tactic 0x0000000000000001 due to exception Unsupported data type FP8.
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 236: quantize: /down_blocks_0/resnets_0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear_output_0'-(fp8[320,320,3,3][]so[], mem_prop=0) | down_blocks_0_resnets_0_spatial_res_block_conv1_weight_constantFloat-{0.0206451, -0.0167847, -0.0323792, -0.0221558, 0.0266113, -0.0697021, 0.03479, 0.0248413, ...}(f32[320,320,3,3][2880,9,3,1]so[3,2,1,0], mem_prop=0), /down_blocks_0/resnets_0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear scale weightsHalf-0.00186539H:(f16[][]so[], mem_prop=0), stream = 0 // /down_blocks.0/resnets.0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear, axis = 0, No matching rules found for input operand types
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 1124: quantize: /down_blocks_0/resnets_0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear_output_0'-(fp8[320,320,3,3][]so[], mem_prop=0) | down_blocks_0_resnets_0_spatial_res_block_conv2_weight_constantFloat-{0.0209808, 0.0167542, 0.0894775, -0.00762939, 0.0802002, 0.072998, 0.0122223, 0.125, ...}(f32[320,320,3,3][2880,9,3,1]so[3,2,1,0], mem_prop=0), /down_blocks_0/resnets_0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear scale weightsHalf-0.00117111H:(f16[][]so[], mem_prop=0), stream = 0 // /down_blocks.0/resnets.0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear, axis = 0, No matching rules found for input operand types

praveenperfecto changed the title ~~RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!~~ demo_img2vid RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Feb 10, 2025

praveenperfecto changed the title ~~demo_img2vid RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!~~ demo_img2vid: Error Code 1: Cuda Runtime (out of memory) Feb 10, 2025

kevinch-nv added triaged Issue has been triaged by maintainers Module:DemoDiffusion Issues regarding demoDiffusion waiting for feedback Requires more information from user to make progress on the issue. labels Feb 10, 2025

kevinch-nv self-assigned this Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

praveenperfecto commented Feb 10, 2025 •

edited

Loading

kevinch-nv commented Feb 10, 2025

praveenperfecto commented Feb 11, 2025 •

edited

Loading

praveenperfecto commented Feb 13, 2025

praveenperfecto commented Feb 14, 2025

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

Comments

praveenperfecto commented Feb 10, 2025 • edited Loading

kevinch-nv commented Feb 10, 2025

praveenperfecto commented Feb 11, 2025 • edited Loading

praveenperfecto commented Feb 13, 2025

praveenperfecto commented Feb 14, 2025

praveenperfecto commented Feb 10, 2025 •

edited

Loading

praveenperfecto commented Feb 11, 2025 •

edited

Loading