Replace load average with PSI metric
The load average metric is misleading as a representation of CPU saturation. Normal CPU utilization is a better real representation of saturation.
On newer Linux, there is a new Pressure Stall Information0 metric that better represents CPU over saturation. This is also useful as it can make single-core saturation more visible.
Signed-off-by: Ben Kochie [email protected]
We'll need to update the dashboards as well. I'm also not sure how common recent kernels with psi enabled are, this might break quite some users. Either way people might already depend on the metric so probably better to add a new one?
I guess we could mark the existing metric deprecated.
I feel that it's too early to rely on PSI metrics availability. Latest RHEL8/CentOS8 doesn't have it enabled. Across my fleet I'm only getting it from Fedora so I'd expect CentOS 9 to have it enabled. I don't have any Debian/Ubuntu systems though.
I agree with @ventifus. What about adding a config option to use PSI instead of load average?
Some time has passed, PSI should be available on all current Linux distros. Yes, there are also still improvements coming in (https://www.phoronix.com/news/Linux-6.1-PSI), but it would be nice to have this as additional utilization metric maybe, to start looking into replacing load avg?