pgmonitor icon indicating copy to clipboard operation
pgmonitor copied to clipboard

Exporter Queries Incompatible with 'Unified' Cgroup Mode

Open andrewlecuyer opened this issue 3 years ago • 3 comments

Describe the issue:

As cgroup v2 support/adoption within Kubernetes continues to progress, I am starting to run into issues with certain exporter queries when a PostgreSQL container is running on a node that itself is running in a unified cgroup mode (meaning cgroup v2 only is exposed).

Specifically, if the PG container is running on a node where pgnodemx shows a unified cgroup mode, e.g.:

postgres=# SELECT monitor.cgroup_mode();
 cgroup_mode 
-------------
 unified
(1 row)

...I then see the following errors in the exporter logs:

time="2022-02-15T00:54:47Z" level=error msg="queryNamespaceMappings returned 3 errors" source="postgres_exporter.go:1474"
time="2022-02-15T00:55:01Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_cpucfs pq: could not open file \"/sys/fs/cgroup//cpu.cfs_period_us\" for reading: No such file or directory" source="postgres_exporter.go:1356"
time="2022-02-15T00:55:02Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_cpuacct pq: failed to find controller cpuacct" source="postgres_exporter.go:1356"
time="2022-02-15T00:55:02Z" level=info msg="Error running query on database \"localhost:5432\": ccp_nodemx_mem pq: could not open file \"/sys/fs/cgroup//memory.limit_in_bytes\" for reading: No such file or directory" source="postgres_exporter.go:1356"
time="2022-02-15T00:55:02Z" level=error msg="queryNamespaceMappings returned 3 errors" source="postgres_exporter.go:1474"

However, if I move that same container to a node where pgnodemx shows a legacy cgroup mode, e.g.:

postgres=# SELECT monitor.cgroup_mode();
 cgroup_mode 
-------------
 legacy
(1 row)

...I then see no errors in the exporter logs.

Therefore, it appears as though the exporter queries are expecting cgroup v1 to be present, which can not always be guaranteed.

Describe the expected behavior:

To ensure consistent metrics collection, the exporter queries should succeed whether a unified or legacy cgroup mode (i.e. cgroup v1 or cgroup v2) is enabled on a specific node. This is required since the containers in a Kubernetes cluster could be moved to a variety of different nodes over time, which could each have different cgroup modes enabled.

Tell us about your environment:

  • pgMonitor version: v4.5
  • Container or non-container: Container
  • Container name / image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.0.4-0
  • Operating System for non-container: N/A
  • PostgreSQL Version: 14.1
  • Exporter(s) in use (incl. version): 0.8.0
  • Prometheus version: 2.27.1
  • AlertManager version: .22.2
  • Grafana version: 7.4.5

andrewlecuyer avatar Feb 15 '22 23:02 andrewlecuyer

I can confirm the same behavior. I'm running kubernetes on Talos nodes which doesn't support cgroup v1.

ammmze avatar Apr 25 '22 04:04 ammmze

I can confirm that I'm finding this when running on Linode's managed LKE.

bendilley avatar Aug 22 '22 08:08 bendilley

The same issue RKE2

gricuk avatar Sep 15 '22 15:09 gricuk

@andrewlecuyer Just double-checking, this is the issue that was fixed by https://github.com/CrunchyData/pgmonitor/pull/303 correct?

keithf4 avatar Oct 25 '22 14:10 keithf4

What image version of crunchy-postgres-exporter will include the fix? ubi8-5.2.1-0 still does not have it.

zerkms avatar Dec 19 '22 02:12 zerkms

I was disappointed by that too @zerkms

bendilley avatar Dec 19 '22 09:12 bendilley

I checked with our containers team. They should have an update to this in the near future.

keithf4 avatar Dec 20 '22 16:12 keithf4

This issue has been fixed in the v5.3.0 release of Crunchy Postgres for Kubernetes:

https://access.crunchydata.com/documentation/postgres-operator/latest/releases/5.3.0/ https://www.crunchydata.com/developers/download-postgres/containers/postgres-operator-5x

This includes an updated version of the crunchy-postgres-exporter container containing the required fix (Crunchy Postgres for Kubernetes v5.3.0 updates pgMonitor to v4.8.0).

Please feel free to reach out if you continue to have any issues.

andrewlecuyer avatar Jan 03 '23 16:01 andrewlecuyer

v5.3.0 appears to be working for me - thank you @andrewlecuyer and @tony-landreth!

bendilley avatar Jan 03 '23 17:01 bendilley