cms [3.x]: Issue with server when updating search indexes

What happened?

Description

Upon saving entries in the CMS admin panel a queue of jobs started racking up as normal, there were quite a few search index jobs coming in and they piled up but they were being executed by the looks of it but then it stopped when it hit about 4 updating search indexes jobs in the queue and then the server dropped off and php-fpm stopped working.

Upon looking in the logs this is what happened:

WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 5 idle, and 10 total children

Steps to reproduce

Open an entry
Make some edits
save the entry
Watch Queue'd jobs start

Expected behavior

The jobs should all clear one by one and not get stuck and break php-fpm.

Actual behavior

When the jobs rack up the server drops out and crashes.

Craft CMS version

Craft CMS 3.7.37

PHP version

7.3.27

Operating system and version

Linux 4.15.0-22-generic

Database type and version

No response

Image driver and version

MySQL 5.7.33

Installed plugins and versions

Amazon S3 - 1.3.0

Asset Rev -6.0.2 AsyncQueue - 2.3.0 Blitz - 3.11.1 Breadcrumb1.1.0 Bugsnag - 2.1. Commerce Variant Cloner - 1.0.0 Contact Form- 2.3.0 Craft Commerce - 3.4.13 Craft Variants - 1.0.1 Eager Beaver - 1.0.4 Field Manager - 2.2.4 Imgix - 2.1.0 Instant Analytics - 1.1.15 Many to Many Field Type - 1.0.2.2 MobileDetect - 1.0.2 Patrol - 3.1.3 Postman - 1.0 Postmark - 2.1.0 Redactor - 2.10.5 Retour - 3.1.70 Scout - 2.6.1 SEOmatic - 3.4.28 Snaptcha - 3.0.11 Stripe for Craft Commerce - 2.4.3 Twig Perversion - 2.2.0 Webperf - 1.0.27 Wordsmith - 3.3.0.1

Apr 13 '22 14:04 mdunbavan

Do you have multiple queue workers?

Apr 13 '22 19:04 brandonkelly

I'm not sure I have on this server? I can check on the server but i'd need to know the best place they would be located to check they are running. Usually I have Horizon running with my Laravel apps that tell me what workers are running through Redis

Apr 13 '22 19:04 mdunbavan

We have potentially encountered the same issue (hard to be sure it's the cause as we can't consistently replicate the problem). For what it's worth we're also using the Async Queue plugin on the affected site. I'm going to be ditching the Async Queue plugin in favour of systemd workers. When doing this on other servers I've always implemented two of them as per Andrew Welch's queue handling article. @brandonkelly Are there any downsides to having two of them, should we just have one?

May 31 '22 10:05 thisisjamessmith

@thisisjamessmith Not necessarily, but you need to configure Craft with a custom mutex driver that can be shared across all instances, as a starting point.

May 31 '22 23:05 brandonkelly

To be clear, I'm implementing two systemd workers both on the same server, not a load balanced environment. Does your comment still apply in that case? I'm not familiar at all with mutex drivers, and would like to avoid that level of customisation if possible!

Jun 01 '22 11:06 thisisjamessmith

Sorry, yeah that’s different. If they’re all running on the same server instance, you don’t need to worry about it.

Jun 01 '22 19:06 brandonkelly