serializedSize(): Does it allocate memory? #126

HenrikBengtsson · 2025-02-07T19:28:29Z

Background

Over at futureverse/future#760, @fproske reports that serializedSize() consumes a lot of memory. They detected this because their containers/VMs are getting killed by OOM, after upgrading to a future version that rely on serializedSize().

I think they used the profvis package to show that they see about ~100 MB of memory allocated by serializedSize(). Indeed, if I run something like:

prof <- profvis::profvis({
  for (kk in 1:1e6) parallelly::serializedSize(NULL)
})

I see lots of memory being reported, e.g.

Code	File	Memory (MB) [deallocated/allocated]
`parallelly::serializedSize`	`<expr>`	-4246.6 / 4492.0
`for (kk in 1:1e6) parallelly::serializedSize(NULL)`	`<expr>`	-445.3 / 221.1

but that looks odd to me.

Troubleshooting

I'm not sure how this happens, but it could be that the internal serialization code of R that we rely on materializes each intermitten object, which we never make use of - we are only interested in the byte counts. Our code is in https://github.com/futureverse/parallelly/blob/develop/src/calc-serialized-size.c.

It could be that something else is going on here. To better inspect them memory allocations, I going low-level base::Rprof(), which profvis uses internally. With this, I get:

library(parallelly)
R <- 1e7

ns <- c(0, 1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7)
data <- data.frame(n = ns, size = double(length(sizes)), bytes_per_call = double(length(sizes)))

for (kk in seq_len(nrow(data))) {
  n <- data$n[kk]
  x <- rnorm(n)
  size <- object.size(x)
  message(sprintf("Object size: %d bytes", size))
  data[kk, "size"] <- size

  Rprof(memory.profiling = TRUE)
  for (rr in 1:R) { serializedSize(x) }
  Rprof(NULL)
  prof <- summaryRprof(memory = "both")
  mem_avg <- prof$by.total[['"serializedSize"', "mem.total"]] * 1024^2 / R
  data[kk, "bytes_per_call"] <- mem_avg
}

print(data)

With R = 1e5, I get:

      n     size bytes_per_call
1 0e+00       48       907.0182
2 1e+00       56       717.2260
3 1e+02      848       959.4470
4 1e+03     8048       761.2662
5 1e+04    80048      2690.6460
6 1e+05   800048      2552.2340
7 1e+06  8000048      2403.3362
8 1e+07 80000048      2819.6209

With R = 1e6, I get:

> data
      n     size bytes_per_call
1 0e+00       48      3794.3771
2 1e+00       56      4143.5529
3 1e+02      848      2741.6068
4 1e+03     8048      2570.6889
5 1e+04    80048      1286.8125
6 1e+05   800048       298.8442
7 1e+06  8000048       294.0207
8 1e+07 80000048      2794.4550

With R = 1e7, I get:

      n     size bytes_per_call
1 0e+00       48       1587.072
2 1e+00       56       1433.246
3 1e+02      848       1556.359
4 1e+03     8048       1615.164
5 1e+04    80048       2918.145
6 1e+05   800048       1345.313
7 1e+06  8000048       1154.765
8 1e+07 80000048       5260.957

I'm not sure what to make of this, because this says that only 2-5 kB is allocated per serializedSize() call regardless of size of object being sized.

@coolbutuseless, as a expert on serialization and the one who came up with serializedSize(), do you know if the internals materialize the different objects as they are being serialized? If so, do you if the R API allows us to avoid that? For instance, if I use:

con <- file(nullfile(), open = "wb")
void <- serialize(x, connection = con)
close(con)

I think the objects being serialized are immediately streamed to the null file, avoiding any materializing in memory. I wonder if that strategy could be used in serializedSize().

The text was updated successfully, but these errors were encountered:

coolbutuseless · 2025-02-09T06:47:27Z

Hi @HenrikBengtsson ,

My understanding of R internals is that there is no materialization happening during the serialization process.

Within R internals, it walks the object being serialized and passes the pointers to the members of that object (and a length) to your specified callback (i.e. count_bytes()). It is not recreating the objects, nor allocating any space to keep them.

Your memory usage calcs with Rprof agree with my understanding of what is happening - no more than a few kB to serialize data - and totally independent of the size of the object being serialized. There's just a minimal number of allocations - probably associated with bookkeeping that R is doing internally while it's walking the object.

HenrikBengtsson · 2025-02-09T22:08:34Z

@coolbutuseless , thank you so much for your insight.

Within R internals, it walks the object being serialized and passes the pointers to the members of that object (and a length) ...

The "passes the pointers" is exactly what I was hoping for. Excellent.

So, it's still to be understood:

why profvis::profvis() reports such different amounts of memory allocation compared to Rprof(), despite using that internally as well, and
why OOM kicks in Use of serializedSize uses significant memory future#760 - hopefully it's just that the memory consumption increased slightly after upgrading future, but enough to push it above the OOM threshhold.

HenrikBengtsson mentioned this issue Feb 7, 2025

Use of serializedSize uses significant memory futureverse/future#760

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serializedSize(): Does it allocate memory? #126

serializedSize(): Does it allocate memory? #126

HenrikBengtsson commented Feb 7, 2025 •

edited

Loading

coolbutuseless commented Feb 9, 2025 •

edited

Loading

HenrikBengtsson commented Feb 9, 2025

serializedSize(): Does it allocate memory? #126

serializedSize(): Does it allocate memory? #126

Comments

HenrikBengtsson commented Feb 7, 2025 • edited Loading

Background

Troubleshooting

coolbutuseless commented Feb 9, 2025 • edited Loading

HenrikBengtsson commented Feb 9, 2025

HenrikBengtsson commented Feb 7, 2025 •

edited

Loading

coolbutuseless commented Feb 9, 2025 •

edited

Loading