Writing to parquet does does not release memory #550
-
When writing to parquet inside of a parallel process, memory used is never released, regardless of explicit use of Here's a reproducible example:
I suspect this is because the C++ library is using threads internally. In my use case, I'm often running up against this, because the memory just "hangs on" over and over as the overall process runs, eventually leading to an OOMkill. Using |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
If you have the option to chose, maybe you're better of using I guess the arrow folks are better suited for answering the question regarding memory not being released. If there's memory creep in parallel workers, I suspect it'll also happen in sequential mode as well - it's just that it'll take longer in sequential mode since it's a single process whose memory is growing instead of multiple. |
Beta Was this translation helpful? Give feedback.
-
Also, I was intrigued by the apparent memory leak, so I played around with this some. If I replace So it's not clear to me that this is because of something |
Beta Was this translation helpful? Give feedback.
If you have the option to chose, maybe you're better of using
plan(future.callr::callr, ...)
, which uses a temporary, independent R process for each future that are shut down after the future completes. The downside is more overhead.I guess the arrow folks are better suited for answering the question regarding memory not being released. If there's memory creep in parallel workers, I suspect it'll also happen in sequential mode as well - it's just that it'll take longer in sequential mode since it's a single process whose memory is growing instead of multiple.