ARROW-17509: [C++] Simplify async scheduler by removing the need to call End
This call to end was a foot-gun because failing to call End on a scheduler would lead to deadlock. This led to numerous situations where one would accidentally fail to call End because of some kind of error or exceptional behavior.
Now the call to end is no longer required. Schedulers are semantically divided (there aren't actually three different classes, just three different supported use cases) into three different types:
- Destroy-when-done schedulers destroy themselves whenever they have finished running all of their tasks. These schedulers require an initial task to bootstrap the scheduler and then that task (and it's subtasks) can add more sub-tasks. This is the simplest and most basic kind of scheduler and the top-level scheduler must always be this kind of scheduler.
- Peer schedulers will be ended when their parent scheduler runs out of tasks. These are useful for sinks which get tasks on a push-driven basis and don't have all their tasks up front. However, when the plan is done, we know for certain no more tasks will arrive.
- Eager peer schedulers are the same as peer schedulers but they might be eagerly ended before their parent scheduler is finished. The file queue in the dataset writer is a good example. It receives tasks in an ad-hoc fashion so it is a peer scheduler. However, once we have written max_rows rows to a file we can end it early (no need to wait for the plan to finish). However, if we forget to end it early (or fail to due so in some error scenario) it will still end when the plan ends.
https://issues.apache.org/jira/browse/ARROW-17509
:warning: Ticket has not been started in JIRA, please click 'Start Progress'.
CC @save-buffer would appreciate a look
A different interface that could be simpler is to simply change the meaning of End to mean "End when empty but allow adding more tasks" instead of "End when empty but disallow adding more tasks".
You are now allowed to add tasks after a scheduler is "ended". The top-level scheduler basically starts "ended" (after the initial task runs). If you don't use or need sub-schedulers then we have what you describe.
However, there are some cases where we don't want a sub-scheduler to end even though it has run out of tasks. For example, when writing to a file. We want to close the file when all write tasks have finished. However, we don't know, at the start, how many write tasks there will be, and we don't have all of them (sometimes InputReceived will add another write task and sometimes it won't, depends on partitioning).
That being said, we still want to end the file writer no matter what when the plan itself is out of tasks. At that point there is no possible way we could add more write tasks so the fact that it hasn't been ended is probably a bug or an error condition we don't explicitly handle and so we should just abort what we were doing and close the file.
- [x] Reminder to myself to update the docs. They still mention that you don't need to dispose the holder but you do need to do so.
- [x] Create a follow-up PR to add deadlock detection
I reworked everything again so the above comments aren't really valid. I've reworked the PR description to reflect the new behavior.
Benchmark runs are scheduled for baseline = a590b00b7af552ec20a13c46c0d578b211bac0a0 and contender = 3da803db456536782e6ad1cd6cb4f5d08d6a5d6a. 3da803db456536782e6ad1cd6cb4f5d08d6a5d6a is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished :arrow_down:0.0% :arrow_up:0.0%] ec2-t3-xlarge-us-east-2
[Failed :arrow_down:0.58% :arrow_up:0.17%] test-mac-arm
[Finished :arrow_down:0.27% :arrow_up:0.0%] ursa-i9-9960x
[Finished :arrow_down:1.06% :arrow_up:0.28%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 3da803db ec2-t3-xlarge-us-east-2
[Finished] 3da803db test-mac-arm
[Finished] 3da803db ursa-i9-9960x
[Finished] 3da803db ursa-thinkcentre-m75q
[Finished] a590b00b ec2-t3-xlarge-us-east-2
[Failed] a590b00b test-mac-arm
[Finished] a590b00b ursa-i9-9960x
[Finished] a590b00b ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java