You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am implementing an autoregressive model and I need to do it in a for loop, but I have encountered a problem when running the model on the GPU (and CPU). For large data, there is a threshold (at time i in the loop) where the runtime suddenly increases many times and the memory starts to run full. Here is a minimal example (reproducing the example may depend on the data and the GPU):
library(torch)
device="cuda:0"
B = torch::torch_rand(size = c(1000L, 500L, 100L), device = device)
A = torch_ones_like(B, device = "cuda:0")
Parameter = torch_tensor(0.1, requires_grad = TRUE, device = device)
res = NA
for(e in 1:100) {
print(e)
tt = system.time({
pred = 1-torch_sigmoid(B + (1.0-Parameter*B))
A = A + pred
# A$add_(pred) # in-place does not help
})
res[e] = tt[3]
}
plot(res, xlab = "epochs", ylab = "runtime" )
Any ideas what might be happening? (The problem (memory leakage and drop in runtime) occurs also on the CPU, but not as severely)
> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] torch_0.13.0
loaded via a namespace (and not attached):
[1] processx_3.8.0 bit_4.0.5 FINN_0.0.900 compiler_4.2.3 R6_2.5.1 magrittr_2.0.3 cli_3.6.1 tools_4.2.3 rstudioapi_0.14
[10] Rcpp_1.0.10 bit64_4.0.5 coro_1.0.3 callr_3.7.3 ps_1.7.3 rlang_1.1.3
GPU: NVIDIA A5000
Cuda: 11.7
The text was updated successfully, but these errors were encountered:
I believe this is expected unfortunatelly. When building autoregressive models, since you have tensors that requires_grad in the computation, torch is storing the full computation graph in order to be able to (at some point) compute derivative of A with respect to Parameter. It's probably growing in memory usage in a exponential manner.
Can you post how you are training your model? A common source of this is issue is that you actually need to call A$detach() at some point to avoid holding the full graph of computations.
Hi @dfalbel,
I am implementing an autoregressive model and I need to do it in a for loop, but I have encountered a problem when running the model on the GPU (and CPU). For large data, there is a threshold (at time i in the loop) where the runtime suddenly increases many times and the memory starts to run full. Here is a minimal example (reproducing the example may depend on the data and the GPU):
plot(res, xlab = "epochs", ylab = "runtime" )
Any ideas what might be happening? (The problem (memory leakage and drop in runtime) occurs also on the CPU, but not as severely)
GPU: NVIDIA A5000
Cuda: 11.7
The text was updated successfully, but these errors were encountered: