Afzal
Afzal
I am running some experiments using NVML and CUDA GeMM implementation for power consumption. I measured the following trend of power consumption for multiplication of two 16384 sized square matrices....
Thanks for making your project open-source. The paper states that that you utilized small anchor sizes for the TUM dataset (Section V A. (b)) but it doesn't specify the anchor...
In your GPU benchmark, you set the persistence mode to ON and then lock the GPU clocks to 1530,1530 as follows: ``` python3 process = subprocess.Popen( 'sudo nvidia-smi --lock-gpu-clocks=1530,1530'.split(' '),...
## ❓ Questions and Help Hi, I have noticed that when `world_size == 1`, `all_reduce` is a no-op and does not apply `scale`: In `torch_xla.core.xla_model` in `def all_reduce`: ``` #...