400% to 700% power usage increase when a Nvidia GPU is detected
Describe the bug
While turing-smart-screen-python is running, Nvidia GPU is always at maximum frequency, high temperature and uses 8 times the power when idle.
To Reproduce
Steps to reproduce the behavior:
- Have Nvidia GPU;
- Start
turing-smart-screen-python; - Observe odd GPU behavior regarding power consumption, frequency and temperature.
Expected behavior
GPU frequency, temperature, and power usage aren't impacted in a significant way by turing-smart-screen-python.
Screenshots / photos of the Turing screen
Add screenshots or photos of the rendering on the smart screen to help explain your problem.
You can drag and drop photos here to add them to the description.
nvtop screenshot using the custom sensors below:
nvtop screenshot using the default Nvidia detection:
Environment:
- Smart screen model: This one, the unnamed 3,5".
- Revision of this project:
main,0f68e15b024a5099e57ace3f969c1219e467df08 - OS with version: Ubuntu 22.04
- Python version: 3.10.13
- Hardware: AMD Ryzen 5 5600G, NVIDIA GeForce RTX 3060 (12 GB), 64 GB RAM
Additional context
In the last few days, I had observed an odd behavior on my headless desktop: the GPU temperature and frequency were always high as if it was being used, and power usage was way higher than normal (from 5 W to 40 W idle). As I investigated this, I found the problem only happened when turing-smart-screen-python was running. I tried commenting out the entire GPU: sections from the theme file, but the problem persisted.
To fix this, I edited sensors_python.py and removed the Nvidia detection. This way, GPU temperature and frequency returned to normal.
What is even more strange is that I set up custom sensors to read the GPU data, there was no change in power consumption, temperature or frequency at all (WARNING: Works on my machine™ code):
class nvGPUFreq(CustomDataSource):
def as_numeric(self) -> float:
pass
def as_string(self) -> str:
try:
saidaNvidia = obtemDadosNvidia()
linhaDividida = saidaNvidia.strip().split()
coreFreq = linhaDividida[5].strip()
return '{}MHz'.format(coreFreq).rjust(7)
except Exception as err:
print(err)
return ''
class nvGPUTemp(CustomDataSource):
def as_numeric(self) -> float:
pass
def as_string(self) -> str:
try:
saidaNvidia = obtemDadosNvidia()
linhaDividida = saidaNvidia.strip().split()
gpuTemp = linhaDividida[2].strip()
return '{}°C'.format(gpuTemp).rjust(5)
except Exception as err:
print(err)
return ''
class nvGPUMem(CustomDataSource):
def as_numeric(self) -> float:
try:
saidaNvidia = obtemDadosNvidia()
linhaDividida = saidaNvidia.strip().split()
gpuMem = int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8])
return gpuMem
except Exception as err:
print(err)
return 0
def as_string(self) -> str:
try:
saidaNvidia = obtemDadosNvidia()
linhaDividida = saidaNvidia.strip().split()
gpuMem = int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8])
return '{} MB'.format(gpuMem).rjust(8)
except Exception as err:
print(err)
return ''
class nvGPUMemPercent(CustomDataSource):
def as_numeric(self) -> float:
try:
saidaNvidia = obtemDadosNvidia()
linhaDividida = saidaNvidia.strip().split()
gpuMemPercent = int(round(100 * (int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8]))/int(linhaDividida[15]),0))
print(gpuMemPercent)
return gpuMemPercent
except Exception as err:
print(err)
return 0
def as_string(self) -> str:
pass
saidaNvidia = ""
ultimaExecucaoNvidiaSMI = 0
def obtemDadosNvidia():
global saidaNvidia
global ultimaExecucaoNvidiaSMI
if (time.time() - ultimaExecucaoNvidiaSMI < 1):
return saidaNvidia
ultimaExecucaoNvidiaSMI = time.time()
processoNV = subprocess.Popen(["nvidia-smi", "dmon", "-s", "pcmu", "-c", "1"],stdout=subprocess.PIPE, stderr=subprocess.PIPE)
saidaEErro = processoNV.communicate()
for linha in saidaEErro[0].decode(encoding="utf-8").strip().split('\n'):
if (linha.startswith('#')):
continue
processoNV_2 = subprocess.Popen(["nvidia-smi", "--query-gpu", "memory.total", "--id=0", "--format=csv,nounits,noheader"],stdout=subprocess.PIPE, stderr=subprocess.PIPE)
saidaEErro_2 = processoNV_2.communicate()
saidaNvidia = linha + " " + saidaEErro_2[0].decode(encoding="utf-8").strip()
print(linha)
return linha
I'm doubtful this is a bug that was introduced on turing-smart-screen-python, as I had an older version available and it's presenting the same behavior now. This may be related to updates on the kernel or Nvidia drivers that caused some change that are now triggering this abnormal behavior. However, if this becomes the new "default", then it may cause problems, as cooking GPUs.
Information about the Nvidia driver: Driver Version: 555.42.06 CUDA Version: 12.5. Tested with 5.15 and 6.5 kernels available on Ubuntu repository.