Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Mumuqiao · 2024-05-07T13:53:17Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
I have read the FAQ documentation but cannot get the expected help.
The bug has not been fixed in the latest version (dev-1.x) or latest version (dev-1.0).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmdetection3d

Environment

sys.platform: linux
Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.64
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.2
OpenCV: 4.9.0
MMEngine: 0.10.4
MMDetection: 3.3.0
MMDetection3D: 1.4.0+fe25f7a
spconv2.0: True

Reproduces the problem - code sample

I'm trying to reproduce VoxelNeXt on mmdetection3d. I transferred the VoxelNeXt code in pull request 2692 to the main branch version of mmdetection3d and fixed bugs in their code. My code can be run successfully now, but I found that memory usage on each GPU is quite unbalanced. Is this phenomenon normal?
I have tried to train CenterPoint on my machine, and the GPU memory usage on GPUs are balanced.
Do you guys know what possible factors could cause this phenomenon?

Reproduces the problem - command or script

shell ./tools/dist_train.sh ./configs/voxelnext/voxelnext_voxel0075_second_secfpn_8xb4-cyclic-20e_nus-3d.py 8

Reproduces the problem - error message

Additional information

No response

The text was updated successfully, but these errors were encountered:

Mumuqiao changed the title ~~Help! Unbalanced GPU memory usage in custom model.~~ Help! Unbalanced GPU memory usage in custom VoxelNext model. May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Mumuqiao commented May 7, 2024 •

edited

Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Help! Unbalanced GPU memory usage in custom VoxelNext model. #2975

Comments

Mumuqiao commented May 7, 2024 • edited

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Reproduces the problem - error message

Additional information

Mumuqiao commented May 7, 2024 •

edited