[Feature] Support Idle Callback in TaskGroup for flexibility usage
What problem does this PR solve?
- Implement Idle Hook: Added TaskGroup::SetWorkerIdleCallback to allow executing custom logic (e.g., IO polling) when a worker thread is idle.
- Support Timeout Wait: Modified ParkingLot::wait to support an optional timeout, preventing workers from sleeping indefinitely when an idle callback is registered.
- Enable Thread-per-Core IO: Enabled thread-local IO management (like io_uring ) by invoking the hook within the worker's thread context.
- Add Unit Test: Added bthread_idle_unittest to verify worker isolation and idle callback execution.
The main reason that we need this is that:
- We want to make sure all iouring cqe (or other similar async engine, including some network call results), can be signaled within its original task group (we don't want cross-thread signal, which is very slow under observation)
- By using a user-defined callback, we can implement the following strategy:
- bthread submit iouriing, and tries to reap cqe result
- if no cqe found, wait() here and next bthread will wake it up.
- But, if the current bthread is the
lastone, then we will rely on the Idle Callback in the task group to wake it up.
- Then we will make the whole stack thread-per-core and iouring-per-thread, we don't need another polling thread to reap all the CQEs, which will not be easy to avoid cross-thread signaling.
Issue Number: none
Problem Summary:
What is changed and the side effects?
Changed:
- task_group.h/.cc
- Added a new function and a few related static member variables to handle idle callbacks
- parking_lot.h
- Added a new
timeoutparam towait()function, with default NULL value, which will not break current implementation.
- Added a new
Side effects:
-
Performance effects: No
-
Breaking backward compatibility: NO
Check List:
- Please make sure your changes are compilable.
- When providing us with a new feature, it is best to add related tests.
- Please follow Contributor Covenant Code of Conduct.
If the request is submitted to the local io_uring function in pthread1, this bthread may be scheduled to pthread2 later. In this case, pthread1 still needs to reap the corresponding CQE and then notify it.
If the request is submitted to the local io_uring function in pthread1, this bthread may be scheduled to pthread2 later. In this case, pthread1 still needs to reap the corresponding CQE and then notify it.
I am not sure if I understand your comment correctly.
- bthread1 (under taskgroup1/pthread1) submitted to iouring1 (which is bound to taskgroup1)
- bthread1
butex.wait, for future wake up. - inside task group1's idle function, it reap iouring1 and get cqe, then notify bthread1 by
butex.signal
I assume the 3th step will put bthread1 to current taskgroup (taskgroup1)'s locak rq_, and taskgroup1 will pop rq_ inside the main loop instead of using remote_rq_ (which may introduce thread futex).
I am new to brpc, so correct me if i understand incorreclty, thanks.
BTW, the idle function here is only useful when there's only a few requests, to make sure we can reap the last one bthread. If we have heavy concurrency requests, we will use the working bthread reaping previous submitted IOs, insteand of waiting for the idle function. So even if we cannot avoid cross wake up inside the idle function, it will be ok.
I believe you're trying to create a run-to-complete model. In a bRPC scenario, the easiest ways to implement this model seem to be:
- Using RDMA's polling mode
- Modifying TCP's epoll_wait to implement polling. Other methods, as I understand them, are event-triggered; io_uring can also achieve event triggering.
The network uses an event-triggered approach, while the storage uses a polling approach, which seems to mismatch the models.
I believe you're trying to create a run-to-complete model. In a bRPC scenario, the easiest ways to implement this model seem to be:
- Using RDMA's polling mode
- Modifying TCP's epoll_wait to implement polling. Other methods, as I understand them, are event-triggered; io_uring can also achieve event triggering.
The network uses an event-triggered approach, while the storage uses a polling approach, which seems to mismatch the models.
I disagree with the notion that asynchronous requests in storage during iouring require a one-loop (thread)-per-core concept. Because bthreads lack scheduler pause points, we can only simulate async operations across other threads, leading to greater overhead. essentially, we are trying to make bthread worker CPU resource utilization more efficient under async io
What I mean is that in the io_uring scenario, if polling mode is not applicable, eventfd can be used to register io_uring events with epoll. I think the efficiency problem of the bthread scheduling model is a common issue unrelated to io_uring.
What I mean is that in the io_uring scenario, if polling mode is not applicable, eventfd can be used to register io_uring events with epoll. I think the efficiency problem of the bthread scheduling model is a common issue unrelated to io_uring.
The reason I added an idle function here is, we want to have a chance to run some user-defined code during the idle time of the task group's pthread worker. Reaping some iouring cqe is one of its use cases, we can also use this mechanism for other purposes:
- Reap async tasks calls (e.g., RocksDB operations in another thread pool, offloaded async compute functions)
Yes, I know you want a mechanism to harvest asynchronous responses, and this PR https://github.com/apache/brpc/pull/2560 is actually designed to support this scenario. It doesn't require modifying the bthread scheduling strategy.
Yes, I know you want a mechanism to harvest asynchronous responses, and this PR #2560 is actually designed to support this scenario. It doesn't require modifying the bthread scheduling strategy.
IIUC, the patch you referred to is meant to register a different event dispatcher, which is useful, but cannot archive the target harvest result on the same pthread/taskgroup.
Correct me if I understand incorrectly, thanks