kineto icon indicating copy to clipboard operation
kineto copied to clipboard

Train process is blocked when kineto is processing traceEvents

Open staugust opened this issue 1 year ago • 1 comments

When using on-demand profiling via dynolog and kineto, we noticed that, when profiling request configured with iterations, the last profiling iteration took more time than other profiling iterations. The train process is blocked at optimizer.step(), which calls step in kineto, finally, in performRunLoop, libkineto::api().client()->stop() took the most time.

At the same time, the processTraceInternal is executed asynchronously in performRunLoop, which will not block torch train process.

I'm wondering whether there's a plan to fix this performance issue to make minimal overhead on pytorch training process when on-demand profiling is enabled. it would be very nice if there's already a plan or a proposal. If not, I'd like to make a proposal later.

staugust avatar Jun 19 '24 10:06 staugust

cc @briancoutinho

sraikund16 avatar Jun 20 '24 15:06 sraikund16

As pr #966 is merged, I think this issue can be closed. Thank you very much @sraikund16 @aaronenyeshi @briancoutinho @sanrise

staugust avatar Oct 16 '24 02:10 staugust