Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU runtime + memory leakage #1184

Open
MaximilianPi opened this issue Aug 5, 2024 · 1 comment
Open

GPU runtime + memory leakage #1184

MaximilianPi opened this issue Aug 5, 2024 · 1 comment

Comments

@MaximilianPi
Copy link
Contributor

Hi @dfalbel,

I am implementing an autoregressive model and I need to do it in a for loop, but I have encountered a problem when running the model on the GPU (and CPU). For large data, there is a threshold (at time i in the loop) where the runtime suddenly increases many times and the memory starts to run full. Here is a minimal example (reproducing the example may depend on the data and the GPU):

library(torch)
device="cuda:0"
B = torch::torch_rand(size = c(1000L, 500L, 100L), device = device)
A = torch_ones_like(B, device = "cuda:0")
Parameter = torch_tensor(0.1, requires_grad = TRUE, device = device)

res = NA
for(e in 1:100) {
  print(e)
  tt = system.time({
  pred = 1-torch_sigmoid(B + (1.0-Parameter*B))
  A = A + pred
 # A$add_(pred) # in-place does not help
  })
  res[e] = tt[3]
}

plot(res, xlab = "epochs", ylab = "runtime" )
image

Any ideas what might be happening? (The problem (memory leakage and drop in runtime) occurs also on the CPU, but not as severely)

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] torch_0.13.0

loaded via a namespace (and not attached):
 [1] processx_3.8.0  bit_4.0.5       FINN_0.0.900    compiler_4.2.3  R6_2.5.1        magrittr_2.0.3  cli_3.6.1       tools_4.2.3     rstudioapi_0.14
[10] Rcpp_1.0.10     bit64_4.0.5     coro_1.0.3      callr_3.7.3     ps_1.7.3        rlang_1.1.3    

GPU: NVIDIA A5000
Cuda: 11.7

@dfalbel
Copy link
Member

dfalbel commented Aug 5, 2024

Hi @MaximilianPi ,

I believe this is expected unfortunatelly. When building autoregressive models, since you have tensors that requires_grad in the computation, torch is storing the full computation graph in order to be able to (at some point) compute derivative of A with respect to Parameter. It's probably growing in memory usage in a exponential manner.

The problem might be more visible on GPU, because at some we try to call R's GC at every iteration trying to free some more memory. You can read more about how to tune this here: https://torch.mlverse.org/docs/articles/memory-management#cuda

Can you post how you are training your model? A common source of this is issue is that you actually need to call A$detach() at some point to avoid holding the full graph of computations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants