Implement more graceful tokio limbo send_off
Hi! First of all, thank you for your work! Interprocess has been very beneficial in my projects so far.
This PR addresses a potential panic in the SendHalf drop implementation and the send_off method The new implementation takes a more graceful approach when attempting to send off the corpse. It falls back to a synchronous bury if there is no active limbo channel and there is no tokio runtime is available.
I also removed the static runtime because I think starting a new runtime in a drop implementation is a bit much hidden cost. The static runtime would also not help in this case because tokio disallows starting a runtime inside a runtime. (even when one is shutting down)
I noticed that most drop implementations in this crate do not bury the corpse directly but send it off either to another thread or here to a tokio runtime. I would not expect much performance (latency) regression with this synchronous bury because this is more a edge-case. But I'm not completly in the picture, do you see it the same way?
Off-topic from this PR: I feel that having that much logic in the drop implemenations is rather bad because there is no way to propagate results back to the crate user. Maybe consider in the future to refactor the bury logic into seperate .shutdown() or .disconnect() methods without implict thread or task spawning. But that looks like no easy task though, so thats another story.
Also note: I left quite a few comments with more details in the repro and in the PR.
Thanks for looking into it! 😊
Issue: #71
This fix isn't quite right, since bury_sync() blocks runtime shutdown (potentially causing a lockup due to untrusted clients, which qualifies as DoS in certain configurations). Note that the whole point of this limbo construction is to hide the fact that blocking has to happen for send buffers to be preserved (send buffer preservation being a compatibility necessity).
The more correct solution here would be to send the handle to the sync limbo instead, which avoids blocking the runtime during the shutdown process.
I also removed the static runtime because I think starting a new runtime in a drop implementation is a bit much hidden cost. The static runtime would also not help in this case because tokio disallows starting a runtime inside a runtime. (even when one is shutting down)
Yea, that bit is a little silly. My original intent there was to piggyback off of Tokio's thread pool and thus dead-code-eliminate the one sync named pipes roll in crates that only use the Tokio side of things.
I feel that having that much logic in the drop implemenations is rather bad because there is no way to propagate results back to the crate user.
Close errors can be explicitly requested via .flush(). Programs that care about handling those should flush explicitly and thus remove the limbo from the equation entirely (if you flush the stream and drop it immediately after, it does not get sent to limbo).
Oh right, I did not think about untrusted clients and potential lockups. Thanks for pointing out .flush(), I also did not now about it yet.
After giving it some more thoughts, I think starting a static tokio runtime maybe isn't so fancy, but it is indeed one of the better solutions here. Sending the corpses off to the sync limbo requires some refactoring. Currently they are not compatible with each other.
I updated the PR with a send_off_to_guaranteed_limbo method inplace of the bury_sync method. With this approach we would have a guaranteed available limbo which runs on its own runtime. This has quite the similarities with the previous implementation, just with a bit more graceful handling to properly catch the shutdown edge-case.
I have two open thoughts though:
- Should we always use the guaranteed limbo after it was initialized, or should we try with each send_off if we can find a tokio runtime from the users code? I'm not quite sure how tokio handles threads of blocking tasks in detail, and if we could benefit to falling back to a user runtime to potentially reuse threads there.
- What is the reason behind using the custom
FileHandle::flush_hndl(handle)method when we have the concrete structs available (NamedPipeServer, NamedPipeClient and File) which all have a async .flush() method? I know that they are certainly also just blocking on some FlushFileBuffers call, but wouldn't it be neater to reuse the tokio code for this?
Sending the corpses off to the sync limbo requires some refactoring. Currently they are not compatible with each other.
This is a matter of turning the Tokio stream into an owned handle. impl Drop for PipeStream already freaks out if the pipe stream is dropped outside of a Tokio runtime, so extracting the owned handle, ignoring the error and sending it off to the sync limbo shouldn't be a problematic change.
wouldn't it be neater to reuse the tokio code for this?
To the best of my knowledge, there is nothing to reuse – Tokio doesn't flush in named pipes (as in, there isn't a single FlushFileBuffers() call in their named pipe code), and the .flush() method on NamedPipeServer and NamedPipeClient comes from AsyncWrite by way of AsyncWriteExt, where AsyncWrite::poll_flush() successfully does absolutely nothing. Thus, FileHandle::flush_hndl() is used in the implementation of .flush() on PipeStream with a bit of additional machinery known as a Tokio flusher, which puts the FileHandle::flush_hndl() call on a spawn_blocking() task.
Using Tokio's File (which has a non-trivial .poll_flush() implementation) under the hood isn't an option either, since its flush implementation concerns itself with allowing async code to wait for completion of write operations, such that dropping File immediately reclaims resources. Some similarities to my limbo approach can be seen there.
To the best of my knowledge, there is nothing to reuse
Alright. It was just a thought. I did not go all the way down the flush method. Good to know.
This is a matter of turning the Tokio stream into an owned handle ... shouldn't be a problematic change.
I did not plan implementing changes or do refactoring outside of the tokio limbo here. This is not something I would be comfortable doing. I am so far fine with the current PR then.
Having grown a bit dissatisfied with the limbo design, I ended up rewriting it entirely (sync and Tokio named pipes now use mostly the same code path, and the whole thing is quite a bit simpler and more efficient), meaning this PR is no longer applicable. The discussion here contributed to the new design, though, so your work was not in vain. Thank you for contributing!