[XNACK] GPU is asleep during the copy and not waking back up when it should.
The cause appears to be that the GPU is asleep during the copy and not waking back up when it should.
Changing the grub options allowed these tests to pass on my test machine. https://github.com/ROCm/ROCm/issues/2418#issuecomment-1702415574
Originally posted by @cderb in https://github.com/ROCm/MIOpen/issues/2864#issuecomment-2089260788
@cderb let's search if we have an internal ticket on this or not, if not we should create one for the runtime and driver. Thanks! FYI: @JehandadKhan @atamazov
@cderb What is the base driver version? 5.6, like mentioned in https://github.com/ROCm/ROCm/issues/2418?
@atamazov this test docker was on rocm 6.1.0-82
@cderb Thanks but the base driver is not included with the image. Can you please provide output of (run it outside the container):
modinfo amdgpu | grep -i -E "(version:)|(vermagic:)"
or
/opt/rocm/bin/rocm-smi --showdriverversion
@atamazov
version: 5.18.13
srcversion: 7D4E7C8EA7D467BB8AED6A1
vermagic: 5.15.0-105-generic SMP mod_unload modversions
Perhaps that would mean updating the base driver on this machine could resolve this issue?
However the base version on our CI nodes is 6.2.4 and we observe the same issue I believe.
@cderb Hmm... The most recent released ROCm is 6.1.0, how CI nodes may have 6.2.4 installed?