rocprofiler icon indicating copy to clipboard operation
rocprofiler copied to clipboard

More time is spent in user mode when rocprofiler is used with MPI.

Open arfio opened this issue 3 years ago • 1 comments

When running an MPI program with rocprof the user time is 39% less than without it. When looking at the Linux kernel trace with LTTng tracer, we can see that the main process for each rank is waiting half the time when rocprof is enabled and it is in running mode without it. When synchronizing the linux kernel trace with the rocprof trace we can see, that this happens with the memory transfer calls.

In the images, blue indicates that the thread is in kernel mode, green, user mode and a yellow line means that the thread is waiting.

withrocprof tracewithoutrocprof

arfio avatar Feb 25 '22 18:02 arfio

Can't reproduce with relatively large kernel. Rocprofiler submits additional packets to hsa_queue forcing sched_yield(). There is no additional switches to kernel space.

kikimych avatar Jul 01 '22 18:07 kikimych

@arfio Apologies for the lack of response. Can you please check if your issue still exists with the latest ROCm 6.2? If resolved, please close the ticket. Thanks!

ppanchad-amd avatar Aug 27 '24 18:08 ppanchad-amd

@arfio Closing ticket. Please feel free to re-open a ticket if you still see the issue with the latest ROCm. Thanks!

ppanchad-amd avatar Sep 24 '24 17:09 ppanchad-amd