Augusto Yao
Augusto Yao
When profiling with `torch.profiler.profile` , generated json file has a section called `distributedInfo` shown as below ```json { "distributedInfo": {"backend": "nccl", "rank": 0, "world_size": 2} } ``` But there's no...
This fix issue #953. Makes `libkineto::api().client()->stop()` and `stopTraceInternal` run in `profilerThread_` so that, the training process will not be blocked.
When using on-demand profiling via `dynolog` and `kineto`, we noticed that, when profiling request configured with iterations, the last profiling iteration took more time than other profiling iterations. The train...
Recently, we have deployed dynolog in our gpu cluster to collect trace files via kineto on-demand profiling. It needs extra efforts to collect trace files dumped to local storage via...