PCM Does Not Display LMB and RMB Metrics in Prometheus Integration
Hello Intel PCM and PQoS developers,
I am facing an issue with Intel PCM where it fails to report local and remote memory bandwidth (LMB and RMB) metrics when monitored through Prometheus, despite these metrics being available and correctly reported when using pqos-msr.
Environment: OS: Linux kernel 5.15.0-112-generic
Configuration:
RDT features enabled (rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba) through GRUB configuration: GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"
CONFIG_X86_CPU_RESCTRL=y in kernel configuration
Issue: When monitoring server performance metrics using Intel PCM with Prometheus, the local and remote memory bandwidth metrics (LMB and RMB) consistently report as 0, indicating no data is being captured or transmitted. However, using pqos-msr, these metrics are clearly available and accurately reported.
Execute sudo pqos-msr -d and note the additional memory bandwidth metrics being monitored.
sudo pqos-msr -d
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
Hardware capabilities
Monitoring
Cache Monitoring Technology (CMT) events:
LLC Occupancy (LLC)
I/O RDT: unsupported
Memory Bandwidth Monitoring (MBM) events:
Total Memory Bandwidth (TMEM)
I/O RDT: unsupported
Local Memory Bandwidth (LMEM)
I/O RDT: unsupported
Remote Memory Bandwidth (RMEM) (calculated)
I/O RDT: unsupported
PMU events:
Instructions/Clock (IPC)
LLC misses
LLC references
LLC misses - pcie read
LLC misses - pcie write
LLC references - pcie read
LLC references - pcie write
Allocation
Cache Allocation Technology (CAT)
L3 CAT
CDP: disabled
Non-Contiguous CBM: unsupported
I/O RDT: unsupported
Num COS: 16
Memory Bandwidth Allocation (MBA)
Num COS: 8
Expected Behavior: Intel PCM should accurately capture and export all available memory bandwidth metrics, including LMB and RMB, to Prometheus.
Actual Behavior: LMB and RMB metrics appear as 0 in Prometheus, suggesting an issue with either the PCM data capture or the export process.
I appreciate any assistance or guidance you can provide and am available for further testing or to provide additional information as needed.
Hi, thanks for creating the issue. Could you please also share the output of lscpu command?
Hi, thank you for getting back to me so quickly. Here is the lscpu output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 28
On-line CPU(s) list: 0-27
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
CPU family: 6
Model: 85
Thread(s) per core: 1
Core(s) per socket: 14
Socket(s): 2
Stepping: 4
CPU max MHz: 2600.0000
CPU min MHz: 1000.0000
BogoMIPS: 5200.00
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
ca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s
s ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
art arch_perfmon pebs bts rep_good nopl xtopology nons
top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
cid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline
_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowpref
etch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti
intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi fl
expriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hl
e avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512
f avx512dq rdseed adx smap clflushopt clwb intel_pt av
x512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsave
s cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
herm arat pln pts pku ospke md_clear flush_l1d arch_ca
pabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (28 instances)
L1i: 896 KiB (28 instances)
L2: 28 MiB (28 instances)
L3: 38.5 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27
Vulnerabilities:
Gather data sampling: Mitigation; Microcode
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flush
es, SMT disabled
Mds: Mitigation; Clear CPU buffers; SMT disabled
Meltdown: Mitigation; PTI
Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled
Retbleed: Mitigation; IBRS
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prct
l and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointe
r sanitization
Spectre v2: Mitigation; IBRS; IBPB conditional; RSB filling; PBRSB
-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT disabled
pqos monitoring VS pqos-msr monitoring
## monitoring with pqos
TIME 2024-06-18 16:55:31
CORE IPC MISSES LLC[KB]
0 1.12 165k 336.0
1 0.95 218k 1904.0
2 0.86 164k 448.0
3 0.86 256k 448.0
4 0.80 106k 504.0
5 1.38 263k 2128.0
6 0.89 197k 280.0
7 1.51 209k 560.0
8 1.18 218k 2240.0
9 0.87 142k 840.0
10 0.74 272k 336.0
11 1.37 165k 504.0
12 0.65 230k 392.0
13 1.57 464k 1792.0
14 0.78 212k 560.0
15 0.79 253k 560.0
16 0.73 188k 336.0
17 0.84 263k 728.0
18 1.67 619k 10864.0
19 1.04 88k 616.0
20 1.09 266k 448.0
21 1.40 226k 840.0
22 0.89 164k 448.0
23 1.50 299k 560.0
24 1.45 321k 112.0
25 1.44 232k 224.0
26 0.93 313k 1568.0
27 1.48 372k 6328.0
##monitoring with pqos-msr
IME 2024-06-18 16:56:06
CORE IPC MISSES LLC[KB] MBL[MB/s] MBR[MB/s]
0 0.70 30k 1008.0 0.7 1.3
1 1.77 126k 896.0 3.3 2.7
2 1.34 19k 1848.0 0.7 0.8
3 0.73 71k 504.0 1.3 1.0
4 0.74 16k 952.0 0.9 0.4
5 0.71 77k 392.0 1.8 2.2
6 0.73 64k 2072.0 2.0 1.0
7 1.84 497k 0.0 0.0 0.0
8 0.83 47k 1568.0 1.3 1.0
9 1.88 220k 1288.0 6.8 10.5
10 0.87 43k 1008.0 1.2 0.7
11 0.76 68k 728.0 1.4 1.7
12 0.85 49k 560.0 0.7 0.5
13 0.73 71k 224.0 2.0 1.9
14 0.86 75k 1736.0 3.2 2.6
15 1.67 127k 3920.0 5.1 8.6
16 1.09 79k 2184.0 2.2 3.1
17 0.92 117k 0.0 0.0 0.0
18 0.87 43k 1064.0 0.7 2.1
19 1.83 132k 2632.0 3.8 5.1
20 0.94 45k 0.0 0.0 0.0
21 0.64 28k 224.0 0.9 0.3
22 0.86 68k 1792.0 2.1 1.6
23 0.90 83k 1400.0 2.7 1.8
24 0.83 26k 1512.0 0.4 0.2
25 1.81 33k 1736.0 0.7 1.4
26 0.77 56k 1344.0 0.7 1.0
27 1.65 211k 2464.0 7.5 10.6
WARN: Core 7 RMID association changed from 4 to 0! The core has been hijacked!
WARN: Core 17 RMID association changed from 9 to 0! The core has been hijacked!
WARN: Core 20 RMID association changed from 11 to 0! The core has been hijacked!
Questions:
- Could there be specific configurations or enhancements within PCM that might enable it to access and display LMB and RMB metrics as pqos-msr does?
- Are there known limitations or conditions under which PCM might not access certain MSR registers effectively?
- Any guidance or recommended settings that could help ensure PCM captures all relevant MSR data, particularly for memory bandwidth metrics?
Thanks, that is helpful. On your CPU we are disabling reading these RDT counters from HW due to errata. Linux kernel does the same: https://github.com/torvalds/linux/commit/d56593eb5eda8f593db92927059697bbf89bc4b3 But when booting the Linux kernel with your RDT options above RDT is re-enabled in the kernel. We can add a similar option in PCM to re-enable these metrics.
Thank you for the quick response and for clarifying the situation with the RDT counters on my CPU.
I appreciate the suggestion to add an option in PCM to re-enable these metrics. To provide further context, I have already enabled RDT features from the boot configuration to ensure that all requisite counters are available at the OS level. My current GRUB configuration is set as follows:
GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"
This configuration was intended to ensure that the RDT options are fully enabled within the Linux kernel. Given this setup, I would be interested in understanding if there might be additional steps or configurations within PCM that need to be aligned with this kernel setting to effectively monitor these metrics.
Looking forward to your guidance on how we might proceed to achieve comprehensive monitoring capabilities.
Thank you for the quick response and for clarifying the situation with the RDT counters on my CPU.
I appreciate the suggestion to add an option in PCM to re-enable these metrics. To provide further context, I have already enabled RDT features from the boot configuration to ensure that all requisite counters are available at the OS level. My current GRUB configuration is set as follows:
GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"This configuration was intended to ensure that the RDT options are fully enabled within the Linux kernel. Given this setup, I would be interested in understanding if there might be additional steps or configurations within PCM that need to be aligned with this kernel setting to effectively monitor these metrics.
Looking forward to your guidance on how we might proceed to achieve comprehensive monitoring capabilities.
the change has been implemented. Set this environment variable: export PCM_ENFORCE_MBM=1
https://github.com/intel/pcm/blob/master/doc/ENVVAR_README.md
hank you for your previous response and the guidance provided.
I have successfully mounted the resctrl filesystem and set the environment variable PCM_ENFORCE_MBM=1 to enforce memory bandwidth monitoring. However, upon starting the PCM sensor server, I encountered errors related to accessing memory bandwidth metrics files in the resctrl filesystem.
~# export PCM_ENFORCE_MBM=1
~# cd zouhirRepos/PCMUpadate/pcm/build/bin/
:~/zouhirRepos/PCMUpadate/pcm/build/bin# ./pcm-sensor-server
===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 22
CPU model number : 85
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;
INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
INFO: using Linux resctrl driver for RDT metrics (L3OCC, LMB, RMB) because resctrl driver is mounted.
Closed perf event handles
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Starting plain HTTP server on http://localhost:9738/
Error Messages:
Error reading /sys/fs/resctrl/mon_groups/pcm10/mon_data/mon_L3_01/mbm_local_bytes. Error: No such file or directory
ERROR: Can not open /sys/fs/resctrl/mon_groups/pcm10/mon_data/mon_L3_00/mbm_total_bytes file.
Despite the filesystem already being mounted, I faced challenges when trying to adjust the mount settings to enable different features. Mount Attempt1:
# mount -t resctrl resctrl /sys/fs/resctrl
mount: /sys/fs/resctrl: resctrl already mounted on /sys/fs/resctrl.
Mount Attempt2: I tried to remount the resctrl filesystem with specific options but received an error regarding bad usage:
mount -t resctrl resctrl -o cdp,cdpl2,mba_MBps /sys/fs/resctrl
Your insights or any further guidance on how to resolve these file access and mounting issues would be greatly appreciated.
there seems to be an issue with the Linux RDT driver (config). Could you try unmounting resctrl and set this env variable in PCM:
export PCM_USE_RESCTRL=0
Then PCM will access RDT directly.
Hello,
I'm experiencing an issue where Local and Remote Memory Bandwidth (LMB and RMB) metrics are not displayed in Prometheus, despite proper configuration and troubleshooting steps taken.
Steps and observations:
- Based on a suggestion, I disabled the Linux RDT driver via RESCTRL with
export PCM_USE_RESCTRL=0and unmounted resctrl to allow PCM direct access to RDT. However, this did not resolve the issue. - I attempted to create a custom monitoring solution that collects only the LMB and RMB using the output from
pqos-msrand integrate this data with PCM data in Prometheus. Unfortunately, I encountered issues running PCM server and pqos monitoring simultaneously. The error message indicates that monitoring on core 0 is already started.
I would appreciate any insights or potential solutions
Thank you for your assistance.
Hello,
I'm experiencing an issue where Local and Remote Memory Bandwidth (LMB and RMB) metrics are not displayed in Prometheus, despite proper configuration and troubleshooting steps taken.
Steps and observations:
- Based on a suggestion, I disabled the Linux RDT driver via RESCTRL with
export PCM_USE_RESCTRL=0and unmounted resctrl to allow PCM direct access to RDT. However, this did not resolve the issue.
sorry for the delay (I was out of office). Could you please share the output of ./pcm-sensor-server in this scenario?
- I attempted to create a custom monitoring solution that collects only the LMB and RMB using the output from
pqos-msrand integrate this data with PCM data in Prometheus. Unfortunately, I encountered issues running PCM server and pqos monitoring simultaneously. The error message indicates that monitoring on core 0 is already started.
this is expected. You can't run pcm and pqos to monitor the RDT metrics because they both try to program/use them exclusively.
could you please also share the complete output of "pcm -r -i=1" main utility (run exclusively to pcm-sensor-server or pqos)?
and also the output of "curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth" when pcm-sensor-server is run exclusively?
Thank you for your response here is the output of the following command lines
# ./pcm-sensor-server
root@seroics:~/zouhirRepos/pcm/pcm/build/bin# export PCM_USE_RESCTRL=0
root@seroics:~/zouhirRepos/pcm/pcm/build/bin# ./pcm-sensor-server
===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 22
CPU model number : 85
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 3
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;
INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs
Closed perf event handles
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Starting plain HTTP server on http://localhost:9738/
#pcm -r -i=1
Processor Counter Monitor (202201-1)
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;
INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 QPI ports detected. 2 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 QPI ports detected. 2 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
Initializing RMIDs
Resetting PMU configuration
Zeroed PMU registers
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max QPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max QPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Detected Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz "Intel(r) microarchitecture codename Skylake-SP" stepping 4 microcode level 0x2007006
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 (read) cache misses
L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*)
L3HIT : L3 (read) cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3MPI : number of L3 (read) cache misses per instruction
L2MPI : number of L2 (read) cache misses per instruction
READ : bytes read from main memory controller (in GBytes)
WRITE : bytes written to main memory controller (in GBytes)
LOCAL : ratio of local memory requests to memory controller in %
L3OCC : L3 occupancy (in KBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
energy: Energy in Joules
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI | L3OCC | TEMP
0 0 0.00 0.53 0.00 0.67 1707 5948 0.59 0.76 0.00 0.00 336 51
1 1 0.00 0.45 0.01 0.75 8879 31 K 0.59 0.69 0.00 0.00 0 46
2 0 0.00 0.54 0.00 0.71 4615 17 K 0.62 0.75 0.00 0.00 448 52
3 1 0.00 0.62 0.00 0.65 1539 7636 0.69 0.81 0.00 0.00 448 46
4 0 0.00 0.54 0.01 0.67 4712 19 K 0.64 0.77 0.00 0.00 112 52
5 1 0.00 0.49 0.00 0.75 1951 6513 0.58 0.72 0.00 0.00 448 46
6 0 0.00 0.66 0.00 0.63 1109 4292 0.60 0.88 0.00 0.00 168 52
7 1 0.00 0.48 0.00 0.66 6900 14 K 0.48 0.66 0.00 0.00 728 48
8 0 0.01 0.75 0.01 0.74 22 K 36 K 0.27 0.73 0.00 0.00 1176 52
9 1 0.00 0.52 0.01 0.79 13 K 37 K 0.44 0.73 0.00 0.00 728 45
10 0 0.01 0.78 0.01 0.97 31 K 34 K 0.07 0.71 0.00 0.00 0 52
11 1 0.01 1.79 0.01 0.87 3355 10 K 0.58 0.91 0.00 0.00 448 47
12 0 0.00 0.52 0.00 0.67 1755 5169 0.57 0.77 0.00 0.00 1456 54
13 1 0.00 0.63 0.00 0.60 3572 11 K 0.52 0.87 0.00 0.00 112 47
14 0 0.00 0.53 0.00 0.69 2992 11 K 0.61 0.76 0.00 0.00 840 53
15 1 0.00 0.60 0.00 0.80 3374 11 K 0.59 0.76 0.00 0.00 336 48
16 0 0.06 1.82 0.03 0.96 24 K 77 K 0.66 0.71 0.00 0.00 56 54
17 1 0.05 1.50 0.03 0.86 34 K 85 K 0.55 0.73 0.00 0.00 12208 47
18 0 0.03 1.38 0.02 0.85 24 K 55 K 0.49 0.73 0.00 0.00 504 54
19 1 0.00 0.50 0.00 0.63 848 4240 0.73 0.79 0.00 0.00 672 49
20 0 0.07 1.57 0.05 0.94 41 K 125 K 0.63 0.69 0.00 0.00 56 53
21 1 0.01 0.84 0.01 0.69 10 K 31 K 0.55 0.80 0.00 0.00 672 50
22 0 0.06 1.28 0.05 0.91 107 K 190 K 0.39 0.66 0.00 0.00 4872 53
23 1 0.01 0.80 0.01 0.68 9274 25 K 0.50 0.81 0.00 0.00 1064 47
24 0 0.01 0.67 0.01 0.73 11 K 30 K 0.52 0.80 0.00 0.00 672 53
25 1 0.01 1.05 0.01 0.67 7555 24 K 0.63 0.78 0.00 0.00 504 46
26 0 0.00 0.78 0.00 0.74 5403 19 K 0.62 0.76 0.00 0.00 1344 51
27 1 0.00 0.32 0.00 0.59 1773 7152 0.67 0.69 0.00 0.00 168 48
---------------------------------------------------------------------------------------------------------------
SKT 0 0.02 1.31 0.01 0.87 285 K 632 K 0.48 0.71 0.00 0.00 12040 50
SKT 1 0.01 1.02 0.01 0.75 108 K 308 K 0.55 0.77 0.00 0.00 18536 45
---------------------------------------------------------------------------------------------------------------
TOTAL * 0.01 1.21 0.01 0.83 393 K 941 K 0.50 0.74 0.00 0.00 N/A N/A
Instructions retired: 913 M ; Active cycles: 752 M ; Time (TSC): 2597 Mticks ; C0 (active,non-halted) core residency: 1.25 %
C1 core residency: 98.75 %; C6 core residency: 0.00 %;
C0 package residency: 100.00 %; C2 package residency: 0.00 %; C6 package residency: 0.00 %;
┌────────────────────────────────────────────────────────────────────────────────┐
Core C-state distribution│01111111111111111111111111111111111111111111111111111111111111111111111111111111│
└────────────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────────────┐
Package C-state distribution│00000000000000000000000000000000000000000000000000000000000000000000000000000000│
└────────────────────────────────────────────────────────────────────────────────┘
PHYSICAL CORE IPC : 1.21 => corresponds to 30.33 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.01 => corresponds to 0.31 % core utilization over time interval
SMI count: 0
Intel(r) UPI data traffic estimation in bytes (data traffic coming to CPU/socket through UPI links):
UPI0 UPI1 UPI2 | UPI0 UPI1 UPI2
---------------------------------------------------------------------------------------------------------------
SKT 0 64 M 64 M 0 | 0% 0% 0%
SKT 1 43 M 43 M 0 | 0% 0% 0%
---------------------------------------------------------------------------------------------------------------
Total UPI incoming data traffic: 215 M UPI data traffic/Memory controller traffic: 0.43
Intel(r) UPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through UPI links):
UPI0 UPI1 UPI2 | UPI0 UPI1 UPI2
---------------------------------------------------------------------------------------------------------------
SKT 0 128 M 127 M 0 | 0% 0% 0%
SKT 1 145 M 147 M 0 | 0% 0% 0%
---------------------------------------------------------------------------------------------------------------
Total UPI outgoing data and non-data traffic: 550 M
MEM (GB)->| READ | WRITE | LOCAL | CPU energy | DIMM energy | UncFREQ (Ghz)
---------------------------------------------------------------------------------------------------------------
SKT 0 0.13 0.09 63 % 49.26 20.29 2.40
SKT 1 0.16 0.12 35 % 47.37 20.73 2.40
---------------------------------------------------------------------------------------------------------------
* 0.29 0.21 48 % 96.63 41.03 2.40
Cleaning up
Closed perf event handles
Zeroed uncore PMU registers
Freeing up all RMIDs
curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth
#curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth
Local_Memory_Bandwidth{socket="0",core="0",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="0",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="6",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="6",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="1",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="1",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="5",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="5",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="2",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="2",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="4",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="4",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="3",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="3",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="14",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="14",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="8",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="8",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="13",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="13",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="9",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="9",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="12",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="12",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="10",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="10",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="11",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="11",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",aggregate="socket",source="core"} 0
Remote_Memory_Bandwidth{socket="0",aggregate="socket",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="0",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="0",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="6",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="6",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="1",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="1",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="5",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="5",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="2",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="2",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="4",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="4",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="3",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="3",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="14",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="14",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="8",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="8",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="13",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="13",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="9",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="9",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="12",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="12",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="10",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="10",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="11",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="11",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",aggregate="socket",source="core"} 0
Remote_Memory_Bandwidth{socket="1",aggregate="socket",source="core"} 0
Local_Memory_Bandwidth{aggregate="system",source="core"} 0
Remote_Memory_Bandwidth{aggregate="system",source="core"} 0
Processor Counter Monitor (202201-1)
it seems you are using the old version. Could you please run the latest version (master branch) and set the new PCM_ENFORCE_MBM=1 environment variable?
Thank you so much the latest version works perfectly.