c3c icon indicating copy to clipboard operation
c3c copied to clipboard

Idea: Concurrency manager

Open joshring opened this issue 1 year ago • 8 comments

Inspired by the article: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Go is the language that does this most beautifully in my opinion. As soon as you do any IO operation, it just parks that goroutine and resumes any other ones that aren’t blocked on IO.

Inspired by scoped allocators:

DynamicArenaAllocator dynamic_arena;
dynamic_arena.init(1024);
mem::@scoped(&dynamic_arena)
{
    // This allocation uses the dynamic arena
    Foo* f = malloc(Foo);
};
// Release any dynamic arena memory.
dynamic_arena.destroy();

An opt-in scope with a job scheduler which can spin up and pause a "thread-like" on IO or other interrupts (?) would be really interesting

  • Similar to an allocator like the DynamicArenaAllocator, the job scheduler behaviour could be configurable to suit an application**
  • only exists in a scope
  • eg how aggressively it paused jobs would determine latency, but reduce throughput how many "thread-like"s to make

Some psuedocode of network based concurrency

concurrency::AsyncIO asyncio;
concurrency::SchedulerWorkStealing work_stealing;
asyncio.init(thread_type: os::GreenThread, timeout: 500, clock_cycle_time: 22, scheduler: work_stealing, num_threads: 4 * NUM_CPU);
concurrency::@scoped(&asyncio)
{
    // Network requests here
};
asyncio.destroy();

Some psuedocode of long running jobs based concurrency

But also support things which are of a very different nature, eg high performance computing job scheduling

concurrency::JobQueue job_queue;
concurrency::SchedulerFIFO fifo_scheduler;
job_queue.init(thread_type: os::Thread, timeout: 100_000, scheduler: fifo_scheduler, num_threads: 10_000);
concurrency::@scoped(&job_queue)
{
    // HPC compute tasks here
};
job_queue.destroy();

variables **

latency, throughput, cross-job-synchronisation, number of jobs, job length, job scheduling

platform constraints

platform memory amount, platform memory bandwidth, platform memory access times if NUMA, platform CPU/GPU, platform network

Task constraints

  • CPU time limit
  • Memory usage limits
  • Time before timeout
  • Access to particular subsystem, eg network may be restricted completely
  • Access to particular subsystem may need to wait for other tasks to complete.
  • Cross-task communication

Cross task communication

  • Barriers force tasks to sync then exchange information, eg if need dependency to proceed.
  • Accumulate work from faster tasks into an in-memory or disk-based buffer to be processed by the slower task when ready, eg sending emails get queued into a list by an API

Task groups (just a kind of task)

  • Groups of tasks, where one task depends on another eg fetching details from a database to use in an API request. Expression blocks or functions might be a nice way to express this

Task priority

  • Task priority, some tasks are critical within a deadline, others are optional within a deadline, eg cancel the optional ones to get the critical ones done in time.
  • Cancelable and non-Cancelable tasks, some tasks should not be cancelled, while others are OK to cancel, based on a message sent to the task or decided centrally?
  • Policy on cancel, restart or do not restart, postpone for X time, exponential backoff etc
  • Interrupts

Fault tolerance - How to handle failed tasks

  • Task must have Excuse handling code
  • Common: Report Excuse, Repeat task, Fail task, Reach Consensus in Distributed systems.

Scheduler types

  • Work stealing (levelling out the workload across threads)
  • Realtime deterministic scheduler (deadline focused)
  • FIFO
  • LIFO
  • Distributed systems
  • NUMA system

So this abstraction should work through the range of use-cases from an embedded system, to a video game, webserver, NUMA system to a distributed system.

joshring avatar Oct 09 '24 12:10 joshring

Nice to see a discussion on this. My main concern about just having the usual threads is that we can have connections that can last for quite while and even worst they just hang there. I've dealt with production servers that had hundreds of websockets connections just sitting there, and other connections that were long polling. Creating a thread for each blocking connection gets very heavy.

I would also like to add another article to the discussion: https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/.

Now i don't think the solution needs to be the async/await that we see in the other languages, but having a good way approach this problem would be fantastic. The default seems to be just callbacks, but i'm not a fan of that.

ygorpontelo avatar Oct 15 '24 22:10 ygorpontelo

I've worked with different solutions. Note that a thread that is blocking in the case of connections is only valid if you only either read or write. If you need both you end up with 1 or 2 extra threads(!) (where the last thread is to control the other two...)

In that case, I almost always just write a single thread that does a select or similar and then dispatches to event queues backed by a thread pool.

Now it's common to just use lambdas instead of proper events. Indeed, for something general that's probably the only really flexible solution in a lot of languages. However, this just spreads code EVERYWHERE, besides the problem of having to use callbacks.

The design I keep coming back to because it just works so well, is to pass events to event handlers, where the event handlers keep the implementation internal to itself, and the event message becomes the async "call". So an event is then an enum value + a payload. This allows very nice features such as:

  1. Event filtering (discard events that is only valid for an earlier state)
  2. Delegating (the event can be passed to a delegate which handles the event, the delegate can be swappable)
  3. Introspection (see what events are queued + their contents)
  4. Record and replay (storing events, then replaying particular scenarios, e.g. to reproduce bugs)
  5. Shallow call stack
  6. The actual code running on event queue XYZ is known (as opposed to queuing a closure, where the closure is merely borrow the event queue's thread.

All of these are very beneficial, even though it looks a bit primitive compared to closures. It's a much more solid approach.

However, this is for applications where there are simultaneous read/writes, for example like in a game.

lerno avatar Oct 16 '24 23:10 lerno

It's still a bit unclear how that would look like to me. In a simpler version of just sending requests, how would you make it available for use?

I know you said it's often not the best solution, but having something would already help. Searching for examples of these is not easy either.

ygorpontelo avatar Oct 17 '24 00:10 ygorpontelo

I came across this article during my readings. The paper talks about leveraging the compiler for better thread support, so we can address some of their main drawbacks. I think that most of the work in modern languages when it comes to async is to approach a procedural paradign, something that threads already have by default. Could be a very good alternative from the "main way" people implement async.

Would love to hear more opinions on the matter, specially from @lerno for the compiler support part.

ygorpontelo avatar Jan 14 '25 17:01 ygorpontelo

What in particular would you want @ygorpontelo

lerno avatar Jan 17 '25 19:01 lerno

Having some kind of thread pool to distribute work. I'm more interested in handling network requests, but after a certain amount just spawning more threads doesn't work.

There are languages that provide "green threads" so they are cheaper than just using normal ones. I could construct it using events, but the API from threads is much more natural and easier to maintain. Something like @joshring initially proposed would be great, at least in the usage.

ygorpontelo avatar Jan 17 '25 20:01 ygorpontelo

I'll add an example EventHandler runner, which might be useful.

lerno avatar Jan 23 '25 10:01 lerno

This might be relevant for the discussion as well: https://www.rfleury.com/p/multi-core-by-default

lerno avatar Oct 10 '25 17:10 lerno