-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serializedSize(): Does it allocate memory? #126
Comments
Hi @HenrikBengtsson , My understanding of R internals is that there is no materialization happening during the serialization process. Within R internals, it walks the object being serialized and passes the pointers to the members of that object (and a length) to your specified callback (i.e. Your memory usage calcs with |
@coolbutuseless , thank you so much for your insight.
The "passes the pointers" is exactly what I was hoping for. Excellent. So, it's still to be understood:
|
Background
Over at futureverse/future#760, @fproske reports that
serializedSize()
consumes a lot of memory. They detected this because their containers/VMs are getting killed by OOM, after upgrading to a future version that rely onserializedSize()
.I think they used the profvis package to show that they see about ~100 MB of memory allocated by
serializedSize()
. Indeed, if I run something like:I see lots of memory being reported, e.g.
parallelly::serializedSize
<expr>
for (kk in 1:1e6) parallelly::serializedSize(NULL)
<expr>
but that looks odd to me.
Troubleshooting
I'm not sure how this happens, but it could be that the internal serialization code of R that we rely on materializes each intermitten object, which we never make use of - we are only interested in the byte counts. Our code is in https://github.com/futureverse/parallelly/blob/develop/src/calc-serialized-size.c.
It could be that something else is going on here. To better inspect them memory allocations, I going low-level
base::Rprof()
, which profvis uses internally. With this, I get:With
R = 1e5
, I get:With
R = 1e6
, I get:With
R = 1e7
, I get:I'm not sure what to make of this, because this says that only 2-5 kB is allocated per
serializedSize()
call regardless of size of object being sized.@coolbutuseless, as a expert on serialization and the one who came up with
serializedSize()
, do you know if the internals materialize the different objects as they are being serialized? If so, do you if the R API allows us to avoid that? For instance, if I use:I think the objects being serialized are immediately streamed to the null file, avoiding any materializing in memory. I wonder if that strategy could be used in
serializedSize()
.The text was updated successfully, but these errors were encountered: