procfs icon indicating copy to clipboard operation
procfs copied to clipboard

Add new sysfs class for Amazon Elastic Fabric Adapter

Open perifaws opened this issue 2 years ago • 2 comments

This change adds a new sysfs class to read metrics from Amazon Elastic Fabric Adapter (EFA). This change is based on the Infiniband class.

EFA is supported on a variety of Amazon EC2 instances (list here) and is relevant for HPC & distributed training (ML) applications in the same fashion as Infiniband.

There's an associated collector for the node_exporter generated for validation. Happy to provide a sample output as requested. Thanks!

Related to the Prometheus Google Groups thread: https://groups.google.com/g/prometheus-developers/c/MEal59mDebs/m/ZQBU1f0hCAAJ

perifaws avatar May 08 '23 17:05 perifaws

Can you please add some unit tests with examples of what the /sys structure looks like? Otherwise this code will be impossible to maintain with confidence.

matthiasr avatar May 14 '23 07:05 matthiasr

What's EFA specific about the collector? I can't see anywhere that it checks the PCI device ID or something like that for an Amazon VID/PID. Looks like it just looks in the normal infiniband directories?

eg if I have a random Mellanox IB device, will this collector ignore it?

dcbw avatar May 17 '23 21:05 dcbw