Graceful Termination Queue Listen while scaling down
Hello,
We are running queue workers in a Kubernetes environment where pods are short-lived and can be interrupted at any time. Currently, the yii\queue\cli\Queue::listen() method continuously listens for new messages until it receives a termination signal (SIGTERM, SIGINT, or SIGHUP). Found related https://github.com/yiisoft/yii2-queue/issues/399
When we push a long-running job to the queue and send a termination signal (e.g., Ctrl+C), the worker behaves correctly by finishing the current job before stopping. However, after the job is processed, the listen() method hangs. Once a new message is pushed to the queue, the process stops immediately.
Expected Behavior for Graceful Termination when scaling down workers: Termination signal + empty queue → The worker should stop immediately. Termination signal during job processing → The worker should complete the current job and stop without continuing to listen for new messages. Is there a way to achieve this behavior natively in Yii2 Queue?
I think that's correct behavior you are describing. How do you run queues? What's in your entry script/cmd?
The workers are running as a Kubernetes Deployment, scaled using HPA (Horizontal Pod Autoscaler).
The component:
'components' => [
'queueService' => [
'class' => yii\queue\amqp_interop\Queue::class,
'vhost' => '',
'host' => 'rabbitmq',
'port' => 5672,
'user' => '',
'password' => '',
'exchangeName' => 'event_sync_exchange',
'queueName' => 'event_sync_queue',
'driver' => yii\queue\amqp_interop\Queue::ENQUEUE_AMQP_LIB,
]
]
The command and args for the container are:
Command: /bin/bash
Args: -c php yii queue-service/listen
During downscaling, even with a properly configured terminationGracePeriodSeconds, the worker gets stuck and does not stop gracefully. Instead, it waits for the full termination period and ultimately ends with a SIGKILL.
EDITED: added component, changed the args.
What's rabbit/leaflets?
Ive updated the original comment.
OK. That looks valid and is likely a bug. Can't dig into it right now myself though :(
This is a known issue for long handlers when running in K8S. It can also occur when using a RabbitMQ cluster and the accompanying HAProxy. The thing is that you need to tell the server that your connection is still alive. You can use the heartbeat option for this (I didn't see it in your config).
[
...,
'heartbeat' => 10, // seconds
]
However, this may still not solve the connection failure problem for K8S and HAProxy, and your handler may fail with an error or lose connection with the server and not reconnect (for example, if the timeout on Ingress k8s and HAProxy are different). In this case, the frame calculation based on the set heartbeat seconds will not fall within this interval.
You can additionally add a setupBroker handler to handle such situations
eg
protected function setupBroker(): void
{
if ($this->setupBrokerDone) {
return;
}
static $reconnectAttempt = 0;
try {
parent::setupBroker();
} catch (Throwable $e) {
if ($reconnectAttempt < $this->retries) {
$this->close();
$reconnectAttempt++;
if ($this->retryInterval > 0) {
usleep($this->retryInterval);
}
$this->open();
$this->setupBroker();
} else {
throw $e;
}
}
}
I'm not sure if I fully understand your point. If I got it right, you're saying that when there’s an Ingress or HAProxy in front of RabbitMQ, there’s a chance the proxy (Ingress or HAProxy) might close the connection, and the heartbeat could detect this and trigger a reconnect attempt?
In our case, we’re trying to address a different scenario: after receiving a SIGTERM (or any termination signal), the worker process hangs without an active connection. Setting up a heartbeat might help detect that the process is no longer alive and eventually terminate it with some delay.
However, we’re unsure if the message being processed at that point would be properly acknowledged or redelivered. We also don’t know if using the heartbeat in this way is the right approach or if it might introduce other issues. (e.g long running message processing)