processx
processx copied to clipboard
Experimental mmap on Unix
This could be in its own package, potentially, but since we pass the file descriptor to the subprocess as a processx connection, it is here now.
Notes:
- Poc implementation.
-
conn_create_mmap()creates a piece of shared memory that contains a bunch of R objects. The objects are copied there, so they can be removed once the function returns. It sets up the copy with leaving space for the SEXP header in front of the objects, so when unpacking we can just put the header there, without copying anything. -
conn_unpack_mmap()unpacks shared memory into an R list. It does not copy the memory, it uses a custom allocator andallocVector3()to create the SEXPs in place. - Currently we need to pass the size of the shared memory externally to the subprocess. This can be worked around by storing the size in a 8 byte integer (?) and then first
mmap()that 8 bytes, to get the size, and thenmmap()again with the correct size. - Currently supported types are the ones that use a contiguous chunk of memory:
REALSXP,INTSXP,LGLSXP,RAWSXP. - Complex types can be supported via writing a custom serializer and un-serializer. This is a non-trivial piece of work.
- ALTREP vectors are instantiated, since we call
REAL(),INTEGER(), etc. on the vectors. - The subprocess uses
MAP_PRIVATE, so it can modify the objects, without affecting the master or the other children. - We open a temporary file to create an fd, and the remove the file from the file system, to make sure that nothing is written back to the disk. Then we
ftruncate()andmmap(), etc. - We only need to copying the data to shared memory once, even if we share it with multiple subprocesses.
- We could start the subprocess(es) right after having the fd. This way the in-memory copy and the startup of the subprocess(es) would run in parallel. By the time the child R processes are up, the shared memory would be ready. This requires additional synchronization between the main and the child process(es).
- This mechanism allows memory sharing at process startup. It is also possible to pass a file descriptor to another process, which would allow sharing memory between processes that are already running. This requires synchronization. It could be used to pass data to a persistent worker, like a
callr::r_session. - Unix only implementation. All this is possible on Windows as well, including passing shared memory handles to already running processes.
- We cannot use
shm_open(), etc. because on macOS the limits for the number of pages that can be shared this way are very very low.
Codecov Report
Merging #201 into master will increase coverage by
1.45%. The diff coverage is0%.
@@ Coverage Diff @@
## master #201 +/- ##
==========================================
+ Coverage 70.22% 71.67% +1.45%
==========================================
Files 31 38 +7
Lines 2556 3838 +1282
==========================================
+ Hits 1795 2751 +956
- Misses 761 1087 +326
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/init.c | 100% <ø> (ø) |
:arrow_up: |
| R/serialize.R | 0% <0%> (ø) |
|
| src/serialization.c | 0% <0%> (ø) |
|
| src/client.c | 36.45% <0%> (-9.84%) |
:arrow_down: |
| src/create-time.c | 68.57% <0%> (-3.66%) |
:arrow_down: |
| src/win/utils.c | 0% <0%> (ø) |
|
| src/win/stdio.c | 70.91% <0%> (ø) |
|
| ... and 8 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update cd267b3...3c89012. Read the comment docs.