Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

Open
praveenperfecto opened this issue Feb 10, 2025 · 4 comments
Open

demo_img2vid: Error Code 1: Cuda Runtime (out of memory) #4353

praveenperfecto opened this issue Feb 10, 2025 · 4 comments
Assignees
Labels
Module:DemoDiffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers waiting for feedback Requires more information from user to make progress on the issue.

Comments

@praveenperfecto
Copy link

praveenperfecto commented Feb 10, 2025

CUDA_VISIBLE_DEVICES=0,1 python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --batch-size=1 --use-cuda-graph
/usr/local/lib/python3.12/dist-packages/modelopt/torch/utils/import_utils.py:25: UserWarning: Failed to import apex plugin due to: ImportError("cannot import name 'UnencryptedCookieSessionFactoryConfig' from 'pyramid.session' (unknown location)")
warnings.warn(f"Failed to import {plugin_name} plugin due to: {repr(e)}")
[I] Initializing StableDiffusion img2vid demo using TensorRT
[I] Autoselected scheduler: Euler
[I] Load Scheduler EulerDiscreteScheduler from: pytorch_model/svd-xt-1.1/IMG2VID/eulerdiscretescheduler/scheduler
Building TensorRT engine for onnx-svd-xt-1-1/unet-temp.opt/model.onnx: engine-svd-xt-1-1/unet-temp.trt10.7.0.post1.plan
Strongly typed mode is False for onnx-svd-xt-1-1/unet-temp.opt/model.onnx
/usr/local/lib/python3.12/dist-packages/polygraphy/backend/trt/util.py:590: DeprecationWarning: Use Deprecated in TensorRT 10.1. Superseded by explicit quantization. instead.
calibrator = config.int8_calibrator
[E] [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:118] autotuning: User allocator error allocating 86114304000-byte buffer
[E] [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:118] autotuning: User allocator error allocating 86114304000-byte buffer

@praveenperfecto praveenperfecto changed the title RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! demo_img2vid RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Feb 10, 2025
@praveenperfecto praveenperfecto changed the title demo_img2vid RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! demo_img2vid: Error Code 1: Cuda Runtime (out of memory) Feb 10, 2025
@kevinch-nv kevinch-nv added triaged Issue has been triaged by maintainers Module:DemoDiffusion Issues regarding demoDiffusion waiting for feedback Requires more information from user to make progress on the issue. labels Feb 10, 2025
@kevinch-nv kevinch-nv self-assigned this Feb 10, 2025
@kevinch-nv
Copy link
Collaborator

What GPU are you using? Most likely your GPU doesn't have enough VRAM required to run this demo.

@praveenperfecto
Copy link
Author

praveenperfecto commented Feb 11, 2025

@kevinch-nv : My system has an NVIDIA H100 GPU with 80GB of HBM3 memory, but encountering an out-of-memory (OOM) error while building the TensorRT engine for onnx-svd-xt-1-1/unet-temp.opt/model.onnx.

Error Code 1: Cuda Runtime (out of memory), Excessive memory request during TensorRT engine building
The error shows an allocation request of 86114304000 bytes (~86GB), which exceeds 80GB GPU memory.

After explicitly passing the parameters --batch-size=1 --use-cuda-graph --height 256 --width 512
then I was able to build engine and do the inference.

|-----------------|--------------|

Module Latency
VAE-Enc 8.72 ms
CLIP 17.98 ms
UNet x 25 2384.03 ms
VAE-Dec 520.62 ms
----------------- --------------
Pipeline 2936.22 ms
----------------- --------------
Throughput: 20.43 videos/min (25 frames)
Saving video to: img2vid-fp16-None-2611-trt.gif,

This is the command
CUDA_VISIBLE_DEVICES=0,1 python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --batch-size=1 --use-cuda-graph

where I have two GPUS, for multi-GPU execution
Does demo_img2vid.py explicitly distribute workloads across multiple GPUs?
If using TensorRT, does it enable multi-GPU execution explicitly?

@praveenperfecto
Copy link
Author

@kevinch-nv, HI Kevinch, I wanted to follow up on the CUDA out-of-memory (OOM) issue encountered while building the TensorRT engine for svd-xt-1.1 and if it's possible to deploy the SVD-XT-1.1 model using Triton Inference Server, given the current setup and TensorRT engine files.

@praveenperfecto
Copy link
Author

Hi @kevinch-nv,

I tried with FP8: python3 demo_img2vid.py --version svd-xt-1.1 --onnx-dir onnx-svd-xt-1-1 --engine-dir engine-svd-xt-1-1 --hf-token=$HF_TOKEN --fp8

[E] Error Code: 9: Skipping tactic 0x0000000000000001 due to exception Unsupported data type FP8.
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception Assertion idx < kNB_PACKED_KERNELS failed.
[E] Error Code: 9: Skipping tactic 0x0000000000000001 due to exception Unsupported data type FP8.
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 236: quantize: /down_blocks_0/resnets_0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear_output_0'-(fp8[320,320,3,3][]so[], mem_prop=0) | down_blocks_0_resnets_0_spatial_res_block_conv1_weight_constantFloat-{0.0206451, -0.0167847, -0.0323792, -0.0221558, 0.0266113, -0.0697021, 0.03479, 0.0248413, ...}(f32[320,320,3,3][2880,9,3,1]so[3,2,1,0], mem_prop=0), /down_blocks_0/resnets_0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear scale weightsHalf-0.00186539H:(f16[][]so[], mem_prop=0), stream = 0 // /down_blocks.0/resnets.0/spatial_res_block/conv1/weight_quantizer/QuantizeLinear, axis = 0, No matching rules found for input operand types
[E] Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [type.cpp:infer_type:145] Could not infer output types for operation: 1124: quantize: /down_blocks_0/resnets_0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear_output_0'-(fp8[320,320,3,3][]so[], mem_prop=0) | down_blocks_0_resnets_0_spatial_res_block_conv2_weight_constantFloat-{0.0209808, 0.0167542, 0.0894775, -0.00762939, 0.0802002, 0.072998, 0.0122223, 0.125, ...}(f32[320,320,3,3][2880,9,3,1]so[3,2,1,0], mem_prop=0), /down_blocks_0/resnets_0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear scale weightsHalf-0.00117111H:(f16[][]so[], mem_prop=0), stream = 0 // /down_blocks.0/resnets.0/spatial_res_block/conv2/weight_quantizer/QuantizeLinear, axis = 0, No matching rules found for input operand types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Module:DemoDiffusion Issues regarding demoDiffusion triaged Issue has been triaged by maintainers waiting for feedback Requires more information from user to make progress on the issue.
Projects
None yet
Development

No branches or pull requests

2 participants