krishh85

Results 19 comments of krishh85

@nikkon-dev Any pointers will be greatly helpful? Thanks

@nikkon-dev @bmarchant , Gently ping on this question?

@nvvfedorov right, the question was specific to MIG instances, like the below metrics(dcgm_fi_prof_gr_engine_active) where there is a non-zero value (which i assume indicates that gpus are being used and pods...

@nvvfedorov Any update on this? Thanks

@nvvfedorov we also, ran a load test to simulate the traffic for a period of time(30 mins) and observed that none of the MIG metrics had container_name, pod_name, pod_namespace info....

@nvvfedorov 1. Ran a script which captures dcgm-exporter metrics from localhost /metrics endpoint. 2. Setup inferencing request on a model served from a host. The hosts is a A100 gpu...

@nvvfedorov Any update on this? SHould be a simple test to see if it works as expected in your tests and if it does we can check if this is...

@nvvfedorov Based on the [code](https://github.com/NVIDIA/dcgm-exporter/blob/main/pkg/dcgmexporter/kubernetes.go#L150) it seems like this is disabled for MIG resource names. Can you please confirm and if so any reason why this is not supported?

@nvvfedorov , Added the details. We use MIXED strategy with 2 mig slices on 4 gpus.(3g.40gb & 4g.40gb) env: - name: MIG_STRATEGY value: mixed - name: NVIDIA_MIG_MONITOR_DEVICES value: all -...

@nvvfedorov I doubt I will be able to do it as I won't be able to download & run external packages on hosts without several reviews. Since you know the...