Stephen Lien Harrell
Stephen Lien Harrell
really a data reduction problem only, sampling size
While the feature works, the way the website works makes xalt timeout the entire machine page, this will need to be retested once https://github.com/TACC/tacc_stats/issues/54 is complete.
Ok, do this for all cfg imports
for CPU need core-affinity matched to job id For Memory: Need to find all memory usage from primary job starter programmatically. Find job starter, then get all child process memory:...
regarding the approach above, need to make sure we can capture detached processes
check to see if we can get redfish going to get a more complete picture
Branch for this issue: https://github.com/TACC/tacc_stats/tree/dcgm_support
Using this document to see what metrics Cazes wants: https://github.com/NVIDIA/dcgm-exporter/blob/main/etc/dcp-metrics-included.csv (From Cazes:) From the PCI section, I’d like to keep track of bytes moved over the PCI bus. We probably...
Create new pre-made graphs and have a page to zoom in.
Talk to Carlos at Intel