resource_monitors: Add cgroupsv2 support for CPU Resource Monitor
Commit Message: Add cgroupsv2 support for CPU Resource Monitor
Additional Description:
Currently CPU resource monitor exclusively works for cgroupsv1 format. When the envoy runs on instances that support cgrougsv2 we see the below error
2025-09-18T22:20:21.751941Z error envoy misc external/envoy/source/extensions/resource_monitors/cpu_utilization/linux_cpu_stats_reader.cc:67 Can't open linux cpu allocated file /sys/fs/cgroup/cpu/cpu.shares thread=22
2025-09-18T22:20:21.751984Z info envoy main external/envoy/source/server/overload_manager_impl.cc:754 Failed to update resource envoy.resource_monitors.cpu_utilization: Can't open file to read CPU utilization thread=22
This is due the difference in folder structure and values in cgroupv2 compared to cgroupsv1. More details can be found in issues #40571, #39978
cgroupsv1
$ ls /sys/fs/cgroup -1
blkio
cpu
cpu,cpuacct
cpuacct
cpuset
devices
freezer
hugetlb
memory
net_cls
net_cls,net_prio
net_prio
perf_event
pids
systemd
Cgroupsv2
$ ls /sys/fs/cgroup -1
cgroup.controllers
cgroup.events
cgroup.freeze
cgroup.kill
cgroup.max.depth
cgroup.max.descendants
cgroup.pressure
cgroup.procs
cgroup.stat
cgroup.subtree_control
cgroup.threads
cgroup.type
cpu.idle
cpu.max
cpu.max.burst
cpu.pressure
cpu.stat
cpu.weight
cpu.weight.nice
cpuset.cpus
cpuset.cpus.effective
cpuset.cpus.partition
cpuset.mems
cpuset.mems.effective
To calculate utilization we used the usage_usec in cpu.stat file across multiple intervals and normalized it to the groups CPU capacity. Unlike cgroupsv1 the usage values are in microseconds instead of milliseconds. The cpu.max values show 600m that is the current limit assigned for the container.
$ cd /sys/fs/cgroup
$ more cpuset.cpus.effective
0-15
$ more cpu.stat
usage_usec 1759327475
user_usec 1236034525
system_usec 523292950
core_sched.force_idle_usec 0
nr_periods 70619
nr_throttled 10445
throttled_usec 385185405
nr_bursts 0
burst_usec 0
$ more cpu.max
60000 100000
$
The goal of this change is to compute normalized CPU utilization — a value between 0.0 and 1.0, representing how much CPU capacity a cgroup is using relative to its assigned limits.
Core Formulae
- CPU Usage and Interval
Δusage = usage_usec(t2) − usage_usec(t1) # microseconds
Δinterval = t2 − t1 # seconds
Δusage: Change in total CPU time used between two timestamps Δinterval: Real elapsed time between measurements
- Effective CPU Capacity
Let:
N = number of CPUs available to the cgroup (usually from cpuset.cpus.effective)
cpu.max = quota and period configuration, defined as:
cpu.max = <quota> <period>
If a quota applies:
Q = quota / period # in CPU cores
C = min(N, Q) # effective capacity
Example:
cpu.max = 60000 / 1000000 Q = 60000 / 1000000 = 0.06 If no quota is set (i.e. cpu.max = max):
C = N
- Normalized CPU Utilization
util_normalized = (Δusage / 1_000_000) / (Δinterval * C)
Where:
- Δusage / 1_000_000 converts microseconds to seconds
- The denominator (Δinterval * C) represents total available CPU time
- util_normalized is a fractional value (0.0 to 1.0)
Example
| Parameter | Description | Value |
|---|---|---|
| usage_usec(t1) | CPU time at start | 1,000,000 µs |
| usage_usec(t2) | CPU time at end | 2,200,000 µs |
| Δusage | 2,200,000 − 1,000,000 | 1,200,000 µs |
| Δinterval | Elapsed time | 2 s |
| C | Effective capacity | 2 CPUs |
util_normalized = (1,200,000 / 1_000_000) / (2 * 2)
= 1.2 / 4
= 0.3
Normalized CPU Utilization = 0.3 (30%)
Risk Level: Testing:
- Added comprehensive unit tests.
- We have an internal fork on which we updated the resource monitor with the same code tested it across below test cases. - 1. Overload manager CPU Utilization Resource Monitor on AL2023 instances with CPU Limit - 2. Overload manager CPU Utilization Resource Monitor on AL2023 instances without CPU Limit - 3. Overload manager CPU Utilization Resource Monitor on Amazon Linux 2023 with different instances types - 4. Overload manager CPU Utilization Resource Monitor on Amazon Linux 2 for cgroupsv2 backwards compatibility
Docs Changes: Release Notes: Platform Specific Features: [Optional Runtime guard:] [Fixes #40571, #39978 [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]
Hi @rajeshetty87, welcome and thank you for your contribution.
We will try to review your Pull Request as quickly as possible.
In the meantime, please take a look at the contribution guidelines if you have not done so already.
As a reminder, PRs marked as draft will not be automatically assigned reviewers, or be handled by maintainer-oncall triage.
Please mark your PR as ready when you want it to be reviewed!
@KBaichoo One of the Envoy checks is failing on code coverage (required 96.6 , current 94%)
I am not sure if the code_coverage is being calculated correctly. I have added comprehensive tests but the code coverage won't bump beyond 94%. I am also not able to verify this locally as bazel coverage //test/extensions/resource_monitors/cpu_utilization:all doesn't work for me on macOS
Would appreciate any help/pointers?
Re: coverage - there should be a link to the coverage report in the CI logs. It's not perfect and misses macros easily.
/assign-from @envoyproxy/envoy-maintainers
@envoyproxy/envoy-maintainers assignee is @adisuissa
/coverage
Coverage for this Pull Request will be rendered here:
https://storage.googleapis.com/envoy-cncf-pr/42014/coverage/index.html
For comparison, current coverage on main branch is here:
https://storage.googleapis.com/envoy-cncf-postsubmit/main/coverage/index.html
The coverage results are (re-)rendered each time the CI Envoy/Checks (coverage) job completes.
@KBaichoo updated
- refactored the code to use a factory pattern
- moved utilization calculations to CpuTime object
- updated change log
Please take a look when you get a chance.
@rajeshetty87 As I understand, you are using the container CPU resource monitor to measure the utilization of the Envoy sidecar itself, is that correct? In my company we are using the "Linux" CPU resource monitor to measure the utilization of the whole VM, including the workload sits next to the Envoy process, which helps because we also want to take into the workload utilization itself. In containerized world, this doesn't work - so I am wondering what your use-case is and how your setup is like.