[3.x]: Issue with server when updating search indexes
What happened?
Description
Upon saving entries in the CMS admin panel a queue of jobs started racking up as normal, there were quite a few search index jobs coming in and they piled up but they were being executed by the looks of it but then it stopped when it hit about 4 updating search indexes jobs in the queue and then the server dropped off and php-fpm stopped working.
Upon looking in the logs this is what happened:
WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 5 idle, and 10 total children
Steps to reproduce
- Open an entry
- Make some edits
- save the entry
- Watch Queue'd jobs start
Expected behavior
The jobs should all clear one by one and not get stuck and break php-fpm.
Actual behavior
When the jobs rack up the server drops out and crashes.
Craft CMS version
Craft CMS 3.7.37
PHP version
7.3.27
Operating system and version
Linux 4.15.0-22-generic
Database type and version
No response
Image driver and version
MySQL 5.7.33
Installed plugins and versions
Amazon S3 - 1.3.0
Asset Rev -6.0.2 AsyncQueue - 2.3.0 Blitz - 3.11.1 Breadcrumb1.1.0 Bugsnag - 2.1. Commerce Variant Cloner - 1.0.0 Contact Form- 2.3.0 Craft Commerce - 3.4.13 Craft Variants - 1.0.1 Eager Beaver - 1.0.4 Field Manager - 2.2.4 Imgix - 2.1.0 Instant Analytics - 1.1.15 Many to Many Field Type - 1.0.2.2 MobileDetect - 1.0.2 Patrol - 3.1.3 Postman - 1.0 Postmark - 2.1.0 Redactor - 2.10.5 Retour - 3.1.70 Scout - 2.6.1 SEOmatic - 3.4.28 Snaptcha - 3.0.11 Stripe for Craft Commerce - 2.4.3 Twig Perversion - 2.2.0 Webperf - 1.0.27 Wordsmith - 3.3.0.1
Do you have multiple queue workers?
I'm not sure I have on this server? I can check on the server but i'd need to know the best place they would be located to check they are running. Usually I have Horizon running with my Laravel apps that tell me what workers are running through Redis
We have potentially encountered the same issue (hard to be sure it's the cause as we can't consistently replicate the problem). For what it's worth we're also using the Async Queue plugin on the affected site. I'm going to be ditching the Async Queue plugin in favour of systemd workers. When doing this on other servers I've always implemented two of them as per Andrew Welch's queue handling article. @brandonkelly Are there any downsides to having two of them, should we just have one?
@thisisjamessmith Not necessarily, but you need to configure Craft with a custom mutex driver that can be shared across all instances, as a starting point.
To be clear, I'm implementing two systemd workers both on the same server, not a load balanced environment. Does your comment still apply in that case? I'm not familiar at all with mutex drivers, and would like to avoid that level of customisation if possible!
Sorry, yeah that’s different. If they’re all running on the same server instance, you don’t need to worry about it.