async-process icon indicating copy to clipboard operation
async-process copied to clipboard

Use a more efficient strategy for waiting on processes

Open notgull opened this issue 2 years ago • 9 comments

Right now, we use SIGPROC to wait for process events on Unix and thread pooling to wait for process events on Windows. This is a very inefficient strategy that doesn't scale for many child processes, like using poll() for I/O notification.

Most operating systems have better ways of handling events like this.

  • Linux has pidfd, which can be registered directly into async-io.
  • BSD has EVFILT_PROC, which is now exposed in async-io.
  • Windows has waitable handles, which need to be exposed in async-io. They can also be made more efficient on the polling side, see smol-rs/async-io#25

notgull avatar Aug 20 '23 19:08 notgull

Will look into this

mamaicode avatar Nov 13 '23 23:11 mamaicode

How this would probably be done:

  • Split the signal-handling logic out of Reaper and move it to a signal.rs file.
  • Create an alternative pidfd.rs file that implements the Reaper logic using Async<OwnedFd> and pidfd_open.
  • Child::wait should be implemented by calling readable() on the PIDFD to wait.
  • Zombie processes should be pushed into an executor that replaces the reap() function.

notgull avatar Nov 14 '23 02:11 notgull

Sorry for the delay! Will start working on this tomorrow

mamaicode avatar Nov 28 '23 11:11 mamaicode

@mamaicode Great! Let me know if you need any guidance; I can hop in a Matrix/Discord call for pair programming if needed.

notgull avatar Dec 03 '23 05:12 notgull

pidfd has been implemented for Linux. However, we should also be able to implement this for Windows and BSD as well.

notgull avatar Apr 20 '24 18:04 notgull

I was looking to contribute the wait implementation for BSD systems but after giving it a try with the async-io crate I think I would appreciate some guidance. My initial idea was to use the Exit structure to wrap the child process and rely on the async-io reactor for the async registration/poll. This works fine, but the main issue is the Exit structure takes ownership of the Child structure, and as it does not implement Clone/Copy we cannot use it for retrieving the exit code with a try_wait() call. The Exit code and a mutable reference to the child structure is required based on the current crate's api.

@notgull Any idea of how we could solve this issue? The async-io crate registers the process to wait for using the Process structure which does not own the Child struct, instead it relies on its pid and a Child reference. Would it be feasible to add a new/modify Exit structure in async-io that does not take ownership of the Child structure?

rogercoll avatar Sep 26 '24 18:09 rogercoll

I think this could be solved by giving the Exit struct an unsafe method that lets you take a mutable reference to the inner child, then wrapping that in async-process.

notgull avatar Sep 27 '24 16:09 notgull

I think this could be solved by giving the Exit struct an unsafe method that lets you take a mutable reference to the inner child, then wrapping that in async-process.

The thing is that the Child ownership in the Exit struct is given to the Registration struct when its registred in the Reactor:

impl QueueableSealed for Exit {
    fn registration(&mut self) -> Registration {
        Registration::Process(self.0.take().expect("Cannot reregister child"))
    }
}

https://github.com/smol-rs/async-io/blob/master/src/os/kqueue.rs#L252

rogercoll avatar Oct 01 '24 09:10 rogercoll

I think the solution here is to use rustix::process::Pid::from_child to get the PID of the child process, and modify Registration to use that instead.

notgull avatar Oct 06 '24 16:10 notgull