chain RPC restarts cause CPU utilisation problem
⚠️ Support requests in an issue-format will be closed immediately. For support, go to Swarm's Discord.
Context
@mfw78 reported
Summary
Found one bee node was at 50% CPU utilisation for some bizarre reason. All the nodes had restarted like 2 days ago when I upgraded Nethermind. Some were stuck at super low depths, just smashing themselves.
It's particular to note - this is what happens when you get into a resource constrained environment. ie. the "starting inertia" is extraordinarily high in bee. But, as the system stabilises, it's inertia drops to near 0.
it would be great to have some logs and metrics from when this sort of thing happens
I see this a LOT on my massively pinning (1+TB) upload node. The pullsync cursorHandler consume copious amounts of CPU per instance and all new peers hit the newly started node at once. I finally put in a hack to limit the cursors handler to a single concurrent instance, but subsequently increased that limit to 1024 because I also now incorporate a singleflight around the pullsync/cursors database routine so that multiple overlapping requests are served by a single loop of the bins.
I see similar cursorHandler loads on my smaller nodes, but they clear quickly enough to not cause an impact. But if a node has a large local pinned database, it's pretty ugly.