Use Notify instead of InternalsGuard
Potential fix for https://github.com/djc/bb8/issues/154
Problem
InternalsGuard is problematic because of the deadlock. Using a ReentrantLock is not possible because both put and drop are holding a mutable reference to the internals. Using RefCell won't work also, because it doesn't allow two or more mutable borrows, which is what's required to make this work, a violation of Rust safety guarantees.
Possible Solution
Switch to using Tokio's Notify that provides a fair queue for waiters. There is no more internal pool guarantee that connections are fairly given to the tasks that have waited the longest, but this fairness may be good enough if the Tokio scheduler and the kernel scheduler ensure fairness as well. Starvation is possible if the caller infinitely retries calls to get.
Additionally, the entire make_pooled function is timed out using the connection_timeout setting. This ensures that there is no internal starvation & that is_valid() is timed out as well; if it isn't, we can starve all tasks & block the caller indefinitely while waiting for a promise.
Open Questions
The error forwarding is not clear to me. I removed it, but I'm not entirely sure what it does. May need some help here to understand if I broke something.
Codecov Report
Patch coverage: 96.96% and project coverage change: +0.84 :tada:
Comparison is base (
ea5c162) 72.12% compared to head (0dc8eda) 72.96%.
Additional details and impacted files
@@ Coverage Diff @@
## main #164 +/- ##
==========================================
+ Coverage 72.12% 72.96% +0.84%
==========================================
Files 6 6
Lines 599 603 +4
==========================================
+ Hits 432 440 +8
+ Misses 167 163 -4
| Impacted Files | Coverage Δ | |
|---|---|---|
| bb8/src/inner.rs | 87.38% <95.91%> (+1.06%) |
:arrow_up: |
| bb8/src/internals.rs | 95.45% <100.00%> (+2.69%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Has there been any progress with the tests? This PR have any issues as 'blocked' waiting for some discussion or support ... ?
Not really, just waiting for djc to take a look.
Sorry for taking so long to look at this. So if I understand correctly, this fix will make the pool less fair? And also, more tasks have to do work if a number of them are waiting for a connection to become available, whereas previously they would just spend that time sleeping?
Because, while the previous design with waiters made it so that earlier requesters would be sent the freed up connection, in this design all waiters are notified and then make another attempt at getting a connection?
I don't think so. Notify is fair according to the docs and notify_one notifies only one task, not all of them.
So if I understand correctly, this fix will make the pool less fair?
I took a look at the implementation of Notify. It uses a FIFO queue [1] [2] for tasks, so the first task to .await will be the first one to be woken up. I think that's good enough to maintain fairness, since the pool will never wake up more tasks than there are available connections. The only way I can think we can have a task skip the queue is a bug in the tokio scheduler.
[1] https://docs.rs/tokio/latest/src/tokio/sync/notify.rs.html#985 [2] https://docs.rs/tokio/latest/src/tokio/sync/notify.rs.html#732
I tried to simplify this a bit, please take a look at #186.