frankenphp icon indicating copy to clipboard operation
frankenphp copied to clipboard

feat: task workers

Open AlliBalliBaba opened this issue 5 months ago • 19 comments

This PR adds a new type of worker thread 'task workers'

caddyfile:

frankenphp {
    task_worker path/to/worker/file.php 10 # same config as regular workers
}

from a regular or worker PHP thread:

frankenphp_handle_task('my task'); # send a task

inside of the task worker:

# equivalent to frankenphp_handle_request(), but handling tasks instead
while(frankenphp_handle_task(string $task) { 
    echo $task; // output: 'my task'
}

Allows:

  • Offloading work to a separate threadpool that is started at Init()
  • Just running any script in a loop (even without calling frankenphp_handle_task)
  • Interacting with threads from go via extensions (#1795)

#1795 is also the reason why this feature is marked as experimental for now. Currently tasks are only passed as strings, but it would theoretically be possible to pass anything, even closures (while keeping thread safety in mind).

AlliBalliBaba avatar Sep 17 '25 20:09 AlliBalliBaba

Couldn't we merge this with #1795?

I'm pretty sure we can optimize #1795 to not parse the request if not provided by adapting what you have done.

I like the function allowing to process a message asynchronously. Something like that was on my todolist.

dunglas avatar Sep 18 '25 06:09 dunglas

Yeah we can merge this with the modular threads logic 👍 feel free to adjust the worker as you need it

AlliBalliBaba avatar Sep 18 '25 10:09 AlliBalliBaba

This looks great. That will replace my need for the caddy-supervisor module.

henderkes avatar Sep 18 '25 13:09 henderkes

There are still quite a few points open:

  • passing any zval to dispatchTask
  • returning a zval at the end of the dispatches task
  • documentation
  • using the same logic for 'worker extensions'
  • mechanisms to handle overflowing queues (maybe?)

I'm trying right now to pass any zval. After looking at how it's done in the parallel extension, we probably need to deep copy the values, though it doesn't have to be quite that complicated. Having a secure mechanism to pass zvals between threads might also be quite nice for extensions in general.

We can go ahead with the current state as an initial version and work from there or I can do everything as a big chunk in this PR.

AlliBalliBaba avatar Sep 20 '25 22:09 AlliBalliBaba

I'm trying right now to pass any zval.

I'd think that just like in the parallel extension, it might have some of the same issues? Such as trying to pass an exception through? Can we just use the parallel extension to lean on pre-existing code?

withinboredom avatar Sep 21 '25 15:09 withinboredom

Yeah we probably can, C dependencies are just kind of cumbersome to install and hard to test. We could allow passing anything if ext/parallel is installed, but only allow a string otherwise.

Easiest way is to just go with strings for now.

AlliBalliBaba avatar Sep 21 '25 19:09 AlliBalliBaba

Cannot we copy/paste the code (with credits)? It would be better to have a core feature depending on a 3rd-party extension.

dunglas avatar Sep 22 '25 12:09 dunglas

Copy pasting the code is also not trivial, because of globals/init/shutdown. I'll try to create a minimal copy.c, would be nice to at least be able to pass on arrays and objects since serializing is very inefficient.

AlliBalliBaba avatar Sep 24 '25 15:09 AlliBalliBaba

Haven't forgotten about this PR, copying zvals directly is just more complex than I anticipated. They probably need to go to persistent memory and then back to local memory. Still probably an order of magnitude more efficient than serializing.

AlliBalliBaba avatar Oct 07 '25 16:10 AlliBalliBaba

Edit: disregard what I said. I was with another issue in my thoughts.

henderkes avatar Oct 07 '25 17:10 henderkes

The parallel extension's copy mechanism's main deal breaker is that it cannot copy internal objects like \DateTime. And these limitations might only lead to confusion (like in this thread).

That's why I think leaving it up to the user to serialize the task might be the best solution still for now, since then the limitations are clearer.

I still would like to optimize passing zvals, but this is now how the initial task worker implementation would look like.

Some subtleties:

  • it's also possible to pass args to task workers, like they would be passed to a cli script.
  • dispatch_task will throw an error if the queue size is larger than 1500 entries (should this be configurable, should it throw an error?)
  • task workers will first complete all tasks before shutdown
  • task workers will restart on 'watch' like regular workers
  • task workers will consume threads from the initial num_threads pool
  • task workers will not scale automatically like other workers (would only make sense with a per-worker max_threads)
  • task completion can currently be awaited on the go side, but not on the PHP side
  • task worker execution time is not limited on default and must be set via set_time_limit() (should it be limited?)
  • 'max tasks before restart' is currently tracked on the PHP side (might make sense to put this on the go side? like in #1867)
  • task workers will write all output to logs
  • no metrics are currently implemented

AlliBalliBaba avatar Oct 07 '25 20:10 AlliBalliBaba

I think most of them are obvious and serialisation is fine.

task completion can currently be awaited on the go side, but not on the PHP side

Unfortunate, but understandable.

task worker execution time is not limited on default and must be set via set_time_limit() (should it be limited?)

Makes sense, configuration via php script seems sensible enough.

'max tasks before restart' is currently tracked on the PHP side (might make sense to put this on the go side?

Like I said in the other thread I'd lean more towards configuration on the FrankenPHP side, but since the choice is to keep it in userland there, we should keep it in userland here too. Consistency is more important than personal preference.

henderkes avatar Oct 07 '25 21:10 henderkes

I implemented an external extension doing the same thing (I'll publish it soon, but I can give you access if needed).

I made the HTTP request optional in https://github.com/php/frankenphp/pull/1910

IMHO, it's better to have that as an extension than in the core. WDYT?

dunglas avatar Oct 08 '25 15:10 dunglas

@dunglas external workers are one of the reasons I created this PR and I'd also like to merge the 'external workers' logic with task workers.

Workers should do one thing: handle requests. frankenphp_handle_request sometimes not handling a request and instead receiving a value mangles these 2 concepts too much together. And it will just become harder to untangle in the future. Even with #1910 there's still a lot of globals resetting happening in the background.

Tasks workers offer a cleaner and more performant way to just pass a message to a PHP thread, no side effects.

If you think that this PR tries to do too much at once, we can also remove the task_worker caddy directive for now.

AlliBalliBaba avatar Oct 08 '25 17:10 AlliBalliBaba

The two concepts look very similar to me: a worker waits for a request (HTTP or anything else) and may respond. I agree that we shouldn't do extra work if not necessary, but I'm pretty sure that it can be achieved by refining the current infrastructure introduced by Rob. I quite like the current public API for extensions. If it's possible to keep it on top of what you propose, why not.

I'm not sure if the PHP function and the task_worker directive should live in core. I would like to keep the core as focused as possible. This feature could quickly become complex (queue size, dispatching algorithms, distributed queuing...). As this can be implemented as an extension (I'll release one soon), IMHO it should be an external extension.

dunglas avatar Oct 09 '25 12:10 dunglas

To iterate on this, I released https://github.com/dunglas/frankenphp-queue

After double thinking about it, I think I get where you want to go.

I still think that we could simplify this code/merge it with the work done on extension worker, but there is indeed benefits to have this in core.

What I would like to have is:

  1. A way to register non-HTTP workers easily. IMHO, the current worker config is almost good enough. To run background scripts such as Symfony's messenger:consume, the only thing we're missing is a a flag to mark the script as always ready. Maybe a new ready directive in the the worker options?

  2. The same public (PHP and Go) API for HTTP and non-HTTP request handling/message passing. If possible I would like to merge as much code as possible too.

  3. A way to dispatch a request/message to a specific handler, like:

     function frankenphp_send_request(mixed $data, ?string $workerName): mixed {}
    
  4. As done in my extension, a way to configure the queue size (buffered chans)

WDYT?

dunglas avatar Oct 10 '25 15:10 dunglas

A public api for just handling PHP requests already exists though. Can it not just be achieved by calling this with a custom worker/request/responsewriter?

r, _:= frankenphp.NewRequestWithContext(r, frankenphp.WithWorkerName("...")))
frankenphp.ServeHTTP(rw, r)
doSomethingAfterRequest()

What's missing is a way to just handle a message. I still think a separate frankenphp_handle_task or frankenphp_handle_message would leave less room for confusion and more room for optimizations. Mixing the two concept leads to other issues like unnecessary globals resetting or race conditions for the return value if a request gets flushed.

I can remove the args option, then the public api just has 1 additional AsTaskWorker() method, otherwise the config is the same. Having a buffered chan for the queue probably makes sense 👍.

Btw your extension has the same problem with copying zvals across threads. The zval gets arena-deallocated as soon as the thread restarts/shuts down and it cannot mix with other PHP threads. So it must be converted to an intermediary representation for queuing. Maybe the answer is to just have types.go support PHP objects as well and heavily optimize that.

AlliBalliBaba avatar Oct 11 '25 17:10 AlliBalliBaba

I guess the only case where it would make sense to use frankenphp_handle_request() for both HTTP requests and messages is if you actually want a single thread to do both simultaneously. IMO a bad separation of concerns, but I'd be fine with it if you have an actual use case.

AlliBalliBaba avatar Oct 11 '25 21:10 AlliBalliBaba

Just to clarify again where we go from here:

  • would you be fine with separate apis for messages and for requests?
  • would you be fine with a separate thread type for messages? (and separate registration in the caddyfile, same config though)
  • How would we combine the logic? Merge the PRs and then work in a separate PR or do it all here?

AlliBalliBaba avatar Oct 12 '25 21:10 AlliBalliBaba