node_exporter icon indicating copy to clipboard operation
node_exporter copied to clipboard

Replace load average with PSI metric

Open SuperQ opened this issue 4 years ago • 5 comments

The load average metric is misleading as a representation of CPU saturation. Normal CPU utilization is a better real representation of saturation.

On newer Linux, there is a new Pressure Stall Information0 metric that better represents CPU over saturation. This is also useful as it can make single-core saturation more visible.

Signed-off-by: Ben Kochie [email protected]

SuperQ avatar Aug 19 '21 10:08 SuperQ

We'll need to update the dashboards as well. I'm also not sure how common recent kernels with psi enabled are, this might break quite some users. Either way people might already depend on the metric so probably better to add a new one?

discordianfish avatar Aug 19 '21 12:08 discordianfish

I guess we could mark the existing metric deprecated.

SuperQ avatar Aug 19 '21 12:08 SuperQ

I feel that it's too early to rely on PSI metrics availability. Latest RHEL8/CentOS8 doesn't have it enabled. Across my fleet I'm only getting it from Fedora so I'd expect CentOS 9 to have it enabled. I don't have any Debian/Ubuntu systems though.

ventifus avatar Aug 19 '21 17:08 ventifus

I agree with @ventifus. What about adding a config option to use PSI instead of load average?

discordianfish avatar Oct 06 '21 09:10 discordianfish

Some time has passed, PSI should be available on all current Linux distros. Yes, there are also still improvements coming in (https://www.phoronix.com/news/Linux-6.1-PSI), but it would be nice to have this as additional utilization metric maybe, to start looking into replacing load avg?

frittentheke avatar May 23 '23 14:05 frittentheke