screwdriver icon indicating copy to clipboard operation
screwdriver copied to clipboard

queue-service workers stop processing after rabbitmq disconnection and connection reestablishment

Open DekusDenial opened this issue 3 years ago • 0 comments

What happened: Due to network disruption, connection to rabbitmq is affected, but even after the connection is reestablished, the queue workers seems to stop processing jobs.

timestamp,message
1653411176320,"{""level"":""info"",""message"":""queueWorker->worker[5] polling builds"",""timestamp"":""2022-05-24T16:52:56.320Z""}"
1653411176321,"{""level"":""info"",""message"":""queueWorker->worker[5] working job builds {\""class\"":\""start\"",\""queue\"":\""builds\"",\""args\"":[{\""buildId\"":45070339,\""jobId\"":925551,\""blockedBy\"":\""925551\"",\""blockedBySameJob\"":true,\""blockedBySameJobWaitTime\"":5}]}"",""timestamp"":""2022-05-24T16:52:56.321Z""}"
1653411176322,"{""level"":""info"",""message"":""45070339 | 925551 | Processing blocked-by filter"",""timestamp"":""2022-05-24T16:52:56.322Z""}"
1653411176323,"{""level"":""info"",""message"":""45070339 | 925551 | BlockedBy list:[ 'running_job_925551' ]"",""timestamp"":""2022-05-24T16:52:56.323Z""}"
1653411176324,"{""level"":""info"",""message"":""45070339 | 925551 | blockingBuildIds:[ '45067220' ]"",""timestamp"":""2022-05-24T16:52:56.323Z""}"
1653411176324,"{""level"":""info"",""message"":""Checking collapsed build for 45070339"",""timestamp"":""2022-05-24T16:52:56.324Z""}"
1653411176324,"{""level"":""info"",""message"":""lastWaitingBuild: 45070339"",""timestamp"":""2022-05-24T16:52:56.324Z""}"
1653411176324,"{""level"":""info"",""message"":""buildsToCollapse: []"",""timestamp"":""2022-05-24T16:52:56.324Z""}"
1653411176326,"{""level"":""info"",""message"":""PUT undefined"",""timestamp"":""2022-05-24T16:52:56.326Z""}"
...
...
1653411748620,"{""level"":""info"",""message"":""Disconnected from rabbitmq: Error: Heartbeat timeout
    at Heart.<anonymous> (/usr/src/app/node_modules/amqplib/lib/connection.js:427:19)
    at Heart.emit (events.js:314:20)
    at Heart.runHeartbeat (/usr/src/app/node_modules/amqplib/lib/heartbeat.js:88:17)
    at listOnTimeout (internal/timers.js:554:17)
    at processTimers (internal/timers.js:497:7)"",""timestamp"":""2022-05-24T17:02:28.619Z""}"
1653411751034,"{""level"":""info"",""message"":""PUT undefined"",""timestamp"":""2022-05-24T17:02:31.033Z""}"
1653411751808,"{""level"":""info"",""message"":""PUT /v4/builds/45072519 completed with attempts, 200, undefined"",""timestamp"":""2022-05-24T17:02:31.808Z""}"
1653411751809,"220524/170231.020, (1653411751020:sdqueuesvc-canary-554f4ccbcb-gcvxv:27:l2tjbmyr:43209) [response,api,queue] http://localhost:80: [1;33mpost[0m /v1/queue/message {} [32m200[0m (789ms)"
1653411752411,"220524/170232.410, (1653411752410:sdqueuesvc-canary-554f4ccbcb-gcvxv:27:l2tjbmyr:43210) [response,api] http://localhost:80: [1;32mget[0m /v1/status {} [32m200[0m (1ms)"
...
...
1653411778733,"{""level"":""info"",""message"":""Connected to rabbitmq!"",""timestamp"":""2022-05-24T17:02:58.733Z""}"
...
...
1653411782411,"220524/170302.410, (1653411782410:sdqueuesvc-canary-554f4ccbcb-gcvxv:27:l2tjbmyr:43214) [response,api] http://localhost:80: [1;32mget[0m /v1/status {} [32m200[0m (1ms)"
1653411790792,"{""level"":""info"",""message"":""PUT undefined"",""timestamp"":""2022-05-24T17:03:10.792Z""}"

What you expected to happen: workers should be able to process as soon as rabbitmq connection is back

How to reproduce it:

DekusDenial avatar May 26 '22 18:05 DekusDenial