Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental mmap on Unix #201

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Experimental mmap on Unix #201

wants to merge 19 commits into from

Conversation

gaborcsardi
Copy link
Member

This could be in its own package, potentially, but since we pass the file descriptor to the subprocess as a processx connection, it is here now.

Notes:

  • Poc implementation.
  • conn_create_mmap() creates a piece of shared memory that contains a bunch of R objects. The objects are copied there, so they can be removed once the function returns. It sets up the copy with leaving space for the SEXP header in front of the objects, so when unpacking we can just put the header there, without copying anything.
  • conn_unpack_mmap() unpacks shared memory into an R list. It does not copy the memory, it uses a custom allocator and allocVector3() to create the SEXPs in place.
  • Currently we need to pass the size of the shared memory externally to the subprocess. This can be worked around by storing the size in a 8 byte integer (?) and then first mmap() that 8 bytes, to get the size, and then mmap() again with the correct size.
  • Currently supported types are the ones that use a contiguous chunk of memory: REALSXP, INTSXP, LGLSXP, RAWSXP.
  • Complex types can be supported via writing a custom serializer and un-serializer. This is a non-trivial piece of work.
  • ALTREP vectors are instantiated, since we call REAL(), INTEGER(), etc. on the vectors.
  • The subprocess uses MAP_PRIVATE, so it can modify the objects, without affecting the master or the other children.
  • We open a temporary file to create an fd, and the remove the file from the file system, to make sure that nothing is written back to the disk. Then we ftruncate() and mmap(), etc.
  • We only need to copying the data to shared memory once, even if we share it with multiple subprocesses.
  • We could start the subprocess(es) right after having the fd. This way the in-memory copy and the startup of the subprocess(es) would run in parallel. By the time the child R processes are up, the shared memory would be ready. This requires additional synchronization between the main and the child process(es).
  • This mechanism allows memory sharing at process startup. It is also possible to pass a file descriptor to another process, which would allow sharing memory between processes that are already running. This requires synchronization. It could be used to pass data to a persistent worker, like a callr::r_session.
  • Unix only implementation. All this is possible on Windows as well, including passing shared memory handles to already running processes.
  • We cannot use shm_open(), etc. because on macOS the limits for the number of pages that can be shared this way are very very low.

@codecov-io
Copy link

Codecov Report

Merging #201 into master will increase coverage by 1.45%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
+ Coverage   70.22%   71.67%   +1.45%     
==========================================
  Files          31       38       +7     
  Lines        2556     3838    +1282     
==========================================
+ Hits         1795     2751     +956     
- Misses        761     1087     +326
Impacted Files Coverage Δ
src/init.c 100% <ø> (ø) ⬆️
R/serialize.R 0% <0%> (ø)
src/serialization.c 0% <0%> (ø)
src/client.c 36.45% <0%> (-9.84%) ⬇️
src/create-time.c 68.57% <0%> (-3.66%) ⬇️
src/win/utils.c 0% <0%> (ø)
src/win/stdio.c 70.91% <0%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd267b3...3c89012. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants