netdata_nv_plugin icon indicating copy to clipboard operation
netdata_nv_plugin copied to clipboard

GPU used by a user

Open scatenag opened this issue 5 years ago • 4 comments

I rephrase this.

It would be nice to have 'per user' GPU monitoring.

Something similar to what we can get with users_cpu_user_percentage_average image I would like to have users_gpu_user_percentage_average.

I have seen something similar in gpustat. image

Could it be possible to get some useful info?

If I'm not wrong both (this and gpustat) use the same library pynvml...

I asked the same here...

--it was ---

Would it be possible to create an alarm that triggers when a user uses more than one GPU?

That is: if on a GPU there are running processes of more than a user?

Thanks

scatenag avatar May 06 '20 14:05 scatenag

One can add a chart entry for GPU memory utilization per GPU, which consists of many lines (one per user). This should be straightforward to implement.

wookayin avatar Jun 18 '20 02:06 wookayin

For the moment I implemented per user GPU memory utilization modifying nvidia-smi collector https://github.com/netdata/netdata/pull/9372

Implementing by pynvml should be more efficient (but, for the moment, I am not python-ninja enough to try).

Do you think it would be possible to monitor per user GPU utilization (not memory)?

scatenag avatar Jun 18 '20 11:06 scatenag

Correct me if I'm wrong, but the only per user metric that is extractable, is how much memory is used. I am not able to find a metric for percent GPU usage per user.

Please let me know if I am just blind ^^

coraxx avatar Apr 18 '21 08:04 coraxx

You are right! I haven't found any way to get percent of GPU usage per user.

With nvidia-smi plugin I use percent of GPU memory usage (as approximation of) for user GPU usage that, for user accounting, is still better than no user info .. image

scatenag avatar Apr 19 '21 08:04 scatenag