pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

[BUG][onetrace][IMME CmdList] Tool got less kernel calls in report than actual submitted

Open xunsongh opened this issue 2 years ago • 2 comments

I made a simple case - submitting and executing 101 kernels (1 M2D and 100 add_kernel) with enabling immediate command list on PVC . I use onetrace and pass the flag -s. I found that in the report, only 1 M2D and 78 add_kernels were captured. And the Append(ns) always be 0. image I guess this might be a bug. So I report it to you and look forward to an solution in a quick fix. Thank you.

xunsongh avatar Aug 24 '23 10:08 xunsongh

[Additional] I tried to add a queue.wait() at the end to make sure every kernels run to an end. But the number of calls is still wrong. image

xunsongh avatar Aug 24 '23 11:08 xunsongh

@xunsongh sorry for responding in such a delay. yes - this issue is reproduced (on simplest dpc_gemm sample) and likely caused in runtime or level-zero behavior re- Immediate command lists change. We are exploring the best solution and should update (fix) on this soon.

jfedorov avatar Dec 05 '23 09:12 jfedorov

@xunsongh Please use unitrace instead.

zma2 avatar Oct 30 '25 22:10 zma2