MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

[XNACK] GPU is asleep during the copy and not waking back up when it should.

Open junliume opened this issue 1 year ago • 7 comments

          The cause appears to be that the GPU is asleep during the copy and not waking back up when it should. 

Changing the grub options allowed these tests to pass on my test machine. https://github.com/ROCm/ROCm/issues/2418#issuecomment-1702415574

Originally posted by @cderb in https://github.com/ROCm/MIOpen/issues/2864#issuecomment-2089260788

junliume avatar May 02 '24 00:05 junliume

@cderb let's search if we have an internal ticket on this or not, if not we should create one for the runtime and driver. Thanks! FYI: @JehandadKhan @atamazov

junliume avatar May 02 '24 00:05 junliume

@cderb What is the base driver version? 5.6, like mentioned in https://github.com/ROCm/ROCm/issues/2418?

atamazov avatar May 02 '24 14:05 atamazov

@atamazov this test docker was on rocm 6.1.0-82

cderb avatar May 06 '24 15:05 cderb

@cderb Thanks but the base driver is not included with the image. Can you please provide output of (run it outside the container):

modinfo amdgpu | grep -i -E "(version:)|(vermagic:)"

or

/opt/rocm/bin/rocm-smi --showdriverversion

atamazov avatar May 06 '24 15:05 atamazov

@atamazov

version:        5.18.13
srcversion:     7D4E7C8EA7D467BB8AED6A1
vermagic:       5.15.0-105-generic SMP mod_unload modversions

Perhaps that would mean updating the base driver on this machine could resolve this issue?

cderb avatar May 06 '24 16:05 cderb

However the base version on our CI nodes is 6.2.4 and we observe the same issue I believe.

cderb avatar May 06 '24 16:05 cderb

@cderb Hmm... The most recent released ROCm is 6.1.0, how CI nodes may have 6.2.4 installed?

atamazov avatar May 06 '24 17:05 atamazov