processx icon indicating copy to clipboard operation
processx copied to clipboard

Experimental mmap on Unix

Open gaborcsardi opened this issue 6 years ago • 1 comments

This could be in its own package, potentially, but since we pass the file descriptor to the subprocess as a processx connection, it is here now.

Notes:

  • Poc implementation.
  • conn_create_mmap() creates a piece of shared memory that contains a bunch of R objects. The objects are copied there, so they can be removed once the function returns. It sets up the copy with leaving space for the SEXP header in front of the objects, so when unpacking we can just put the header there, without copying anything.
  • conn_unpack_mmap() unpacks shared memory into an R list. It does not copy the memory, it uses a custom allocator and allocVector3() to create the SEXPs in place.
  • Currently we need to pass the size of the shared memory externally to the subprocess. This can be worked around by storing the size in a 8 byte integer (?) and then first mmap() that 8 bytes, to get the size, and then mmap() again with the correct size.
  • Currently supported types are the ones that use a contiguous chunk of memory: REALSXP, INTSXP, LGLSXP, RAWSXP.
  • Complex types can be supported via writing a custom serializer and un-serializer. This is a non-trivial piece of work.
  • ALTREP vectors are instantiated, since we call REAL(), INTEGER(), etc. on the vectors.
  • The subprocess uses MAP_PRIVATE, so it can modify the objects, without affecting the master or the other children.
  • We open a temporary file to create an fd, and the remove the file from the file system, to make sure that nothing is written back to the disk. Then we ftruncate() and mmap(), etc.
  • We only need to copying the data to shared memory once, even if we share it with multiple subprocesses.
  • We could start the subprocess(es) right after having the fd. This way the in-memory copy and the startup of the subprocess(es) would run in parallel. By the time the child R processes are up, the shared memory would be ready. This requires additional synchronization between the main and the child process(es).
  • This mechanism allows memory sharing at process startup. It is also possible to pass a file descriptor to another process, which would allow sharing memory between processes that are already running. This requires synchronization. It could be used to pass data to a persistent worker, like a callr::r_session.
  • Unix only implementation. All this is possible on Windows as well, including passing shared memory handles to already running processes.
  • We cannot use shm_open(), etc. because on macOS the limits for the number of pages that can be shared this way are very very low.

gaborcsardi avatar Jun 10 '19 10:06 gaborcsardi

Codecov Report

Merging #201 into master will increase coverage by 1.45%. The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #201      +/-   ##
==========================================
+ Coverage   70.22%   71.67%   +1.45%     
==========================================
  Files          31       38       +7     
  Lines        2556     3838    +1282     
==========================================
+ Hits         1795     2751     +956     
- Misses        761     1087     +326
Impacted Files Coverage Δ
src/init.c 100% <ø> (ø) :arrow_up:
R/serialize.R 0% <0%> (ø)
src/serialization.c 0% <0%> (ø)
src/client.c 36.45% <0%> (-9.84%) :arrow_down:
src/create-time.c 68.57% <0%> (-3.66%) :arrow_down:
src/win/utils.c 0% <0%> (ø)
src/win/stdio.c 70.91% <0%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cd267b3...3c89012. Read the comment docs.

codecov-io avatar Jul 25 '19 10:07 codecov-io