Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of serializedSize uses significant memory #760

Closed
fproske opened this issue Feb 3, 2025 · 5 comments
Closed

Use of serializedSize uses significant memory #760

fproske opened this issue Feb 3, 2025 · 5 comments
Labels

Comments

@fproske
Copy link

fproske commented Feb 3, 2025

(Please use https://github.com/futureverse/future/discussions for Q&A)

Describe the bug

Upgrading from future version 1.33.2 to version 1.34.0 results in OOM issues running in container. Profiling the code showed that the serializedSize function allocates significant amounts of memory. It's hard for me to create a reproducible example as some of the globals I'm providing to the future() call are nested R6 objects. Is it discouraged to send such complex data structures?

Image

Expected behavior

Using significantly less memory as in previous versions

Session information

Occurs both on my macOS ARM and on Linux pods in Kubernetes cluster. future_1.34.0, parallelly_1.42.0 , globals_0.16.3

@HenrikBengtsson
Copy link
Collaborator

Thanks for reporting.

Upgrading from future version 1.33.2 to version 1.34.0 results in OOM issues running in container.

Yes, the built-in size estimator of global variables was switch from an internal ad-hoc approach to use parallelly::serializedSize() in future 1.34.0.

Before anything else, there is of course the possibility that other things were updated too, when you upgrade the future package. Did you verify that the OOM problem goes away if you re-install future 1.33.2? You can do that quickly by install.packages("https://cran.r-project.org/src/contrib/Archive/future/future_1.33.2.tar.gz"). It would be useful to rule out other reasons.

After having verified that it is indeed specific to future 1.34.0, please see if the problem goes away if you switch back to the previous estimator by setting:

options(future.globals.objectSize.method = "objectSize")

An alternative is to set environment variable,

export R_FUTURE_GLOBALS_OBJECTSIZE_METHOD="objectSize"

before starting R. This will be detected and parsed by the future on load such that corresponding R option will be set.

Your feedback on this will be critical to understand and fix this. Based on that, I will then be able to give you suggestions on the best workaround until resolved.

PS. You might indeed be onto something, so I created futureverse/parallelly#126 to see if parallelly::serializedSize() can be optimized further.

fproske added a commit to GAMS-dev/miro that referenced this issue Feb 10, 2025
With update to future version 1.34.0, memory usage increased significantly due to the use of `parallelly::serializedSize`. I created an [issue](futureverse/future#760) and downgraded future for now
@fproske
Copy link
Author

fproske commented Feb 11, 2025

Sorry for the late response, it took a while to reproduce the issue and run some experiments. Our initial issue was that after upgrading to future 1.34.0 we got the warning that the size of globals would exceed 500MB. We have never seen this warning before and downgrading only the future package also resolved this issue (same as upgrading and setting options(future.globals.maxSize = Inf) which is our current workaround). Profiling the process led to the discovery mentioned in the initial post.

We have done some more experiments running our application in a memory-restricted environment (using Docker): Our application before initiating the future() (using multicore) requires around 400MB of memory as reported by top, thus we set the memory limit of the Docker container to 500m.
With the globals size checking enabled (default: options(future.globals.maxSize = 500 * 1024 ^ 2)) we get an error that the objects we are trying to allocate would exceed 500MB. However, when we disable the check (options(future.globals.maxSize = Inf) and run the exact same experiment, everything works flawlessly. The process is not OOM killed.

To conclude: I'm not sure why profvis reports such high memory for the serializedSize function, but I guess the more important issue/question is whether the memory reported by the new function is accurate.

For us, the issue is resolved as we are happy with disabling this check (we were simply not aware that such artificial size restriction existed before the issue started appearing), but are happy to run additional experiments if that helps with debugging. If you conclude that everything works as expected feel free to close this.

PS: Thanks for all the work you do with the future package. It is very useful to us!

@HenrikBengtsson
Copy link
Collaborator

HenrikBengtsson commented Feb 11, 2025

So, the background for having that error on "the size of globals would exceed 500MB" is to protect users from sending ginormous objects over the internet by mistake. Since parallelization is so easy these days, it is not unlikely that someone would end up sending 1-100's of GBs by mistake over the network without knowing. The only thing they would notice is that it would take a very long time. They might also get a notification from a sysadm, and worse, if they're on metered internet, which some people still are, they would get a hefty bill.

We have never seen this warning ...

So, that's one critique against this built-in protection. No matter what the threshold is, there will always be users operating just below the limit, but then one day the work with some data that pushes them just over the limit, and boom, they get a surprising error. That sends them down a troubleshooting path, and, worse, it might happen at the very end of a long-running pipeline. So, it wasted time and efforts. This is what happened to you (although here it was due to a change of the internals and not data per se).

Since day one I thought about this, and I'm considering removing this protection, i.e. make options(future.globals.maxSize = +Inf) the default, and then probably document the risk of parallelizing out on the internet. It's quite likely I'll do this in the next release.

... I guess the more important issue/question is whether the memory reported by the new function is accurate.

... or, equivalently, if the old objectSize algorithm is accurate :p Joking aside, estimating the size of an object in R is tricky. The old approach was very ad hoc and did not go too deep in nested objects - it simply ignored deeply nested elements. The new serializedSize approach definitely includes all components. So, I'd say that serializedSize gives a more accurate estimate of the object size.

Thanks for reaching out and sharing your use case here.

@HenrikBengtsson
Copy link
Collaborator

The process is not OOM killed.

Forgot to say, this part still puzzles me. If nothing else changed, and based on the conclusions in futureverse/parallelly#126 that serializedSize() has a constant, minimal memory footprint, I still don't understand what the OOM was trigger serializedSize and not with objectSize. If anything, I'd expect the latter to consume a tad more memory, because it walks all the globals. Oh well, I'll probably put this one on the maybe-some-day list.

@HenrikBengtsson
Copy link
Collaborator

I'm closing, but if you, or someone else run into this, please share here, or create a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants