-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of serializedSize
uses significant memory
#760
Comments
Thanks for reporting.
Yes, the built-in size estimator of global variables was switch from an internal ad-hoc approach to use Before anything else, there is of course the possibility that other things were updated too, when you upgrade the future package. Did you verify that the OOM problem goes away if you re-install future 1.33.2? You can do that quickly by After having verified that it is indeed specific to future 1.34.0, please see if the problem goes away if you switch back to the previous estimator by setting: options(future.globals.objectSize.method = "objectSize") An alternative is to set environment variable, export R_FUTURE_GLOBALS_OBJECTSIZE_METHOD="objectSize" before starting R. This will be detected and parsed by the future on load such that corresponding R option will be set. Your feedback on this will be critical to understand and fix this. Based on that, I will then be able to give you suggestions on the best workaround until resolved. PS. You might indeed be onto something, so I created futureverse/parallelly#126 to see if |
With update to future version 1.34.0, memory usage increased significantly due to the use of `parallelly::serializedSize`. I created an [issue](futureverse/future#760) and downgraded future for now
Sorry for the late response, it took a while to reproduce the issue and run some experiments. Our initial issue was that after upgrading to future 1.34.0 we got the warning that the size of globals would exceed 500MB. We have never seen this warning before and downgrading only the future package also resolved this issue (same as upgrading and setting We have done some more experiments running our application in a memory-restricted environment (using Docker): Our application before initiating the To conclude: I'm not sure why For us, the issue is resolved as we are happy with disabling this check (we were simply not aware that such artificial size restriction existed before the issue started appearing), but are happy to run additional experiments if that helps with debugging. If you conclude that everything works as expected feel free to close this. PS: Thanks for all the work you do with the future package. It is very useful to us! |
So, the background for having that error on "the size of globals would exceed 500MB" is to protect users from sending ginormous objects over the internet by mistake. Since parallelization is so easy these days, it is not unlikely that someone would end up sending 1-100's of GBs by mistake over the network without knowing. The only thing they would notice is that it would take a very long time. They might also get a notification from a sysadm, and worse, if they're on metered internet, which some people still are, they would get a hefty bill.
So, that's one critique against this built-in protection. No matter what the threshold is, there will always be users operating just below the limit, but then one day the work with some data that pushes them just over the limit, and boom, they get a surprising error. That sends them down a troubleshooting path, and, worse, it might happen at the very end of a long-running pipeline. So, it wasted time and efforts. This is what happened to you (although here it was due to a change of the internals and not data per se). Since day one I thought about this, and I'm considering removing this protection, i.e. make
... or, equivalently, if the old Thanks for reaching out and sharing your use case here. |
Forgot to say, this part still puzzles me. If nothing else changed, and based on the conclusions in futureverse/parallelly#126 that |
I'm closing, but if you, or someone else run into this, please share here, or create a new issue. |
(Please use https://github.com/futureverse/future/discussions for Q&A)
Describe the bug
Upgrading from future version 1.33.2 to version 1.34.0 results in OOM issues running in container. Profiling the code showed that the
serializedSize
function allocates significant amounts of memory. It's hard for me to create a reproducible example as some of the globals I'm providing to thefuture()
call are nested R6 objects. Is it discouraged to send such complex data structures?Expected behavior
Using significantly less memory as in previous versions
Session information
Occurs both on my macOS ARM and on Linux pods in Kubernetes cluster. future_1.34.0, parallelly_1.42.0 , globals_0.16.3
The text was updated successfully, but these errors were encountered: