Idle Time
Hi everyone, I started using the Tensorflow profiler, which i found very useful, with the tutorial (https://github.com/tensorflow/tensorboard/blob/master/docs/tensorboard_profiling_keras.ipynb) and with a custom model. In both cases, the idle time in Tensorflow stats is about 90%: is this normal? Why the option "Include idle time" is not the default one?
Is the idle time in TensorFlow Stats on host? or on device? If it is on host, it is probably okay. It means that your model doesn't use the host much. If it is on device, it means that your accelerator is largely not used. You probably want to increase its utilization. "Include idle time" is not the default because many users of TensorFlow Stats want to visualize the relative timing of actual ops. If we include the idle time, the actual-op portions may become too small to visualize clearly.
Thanks for the quick response, It was on both sides: the device and the host. Unfortunatly I can't provide logs in these days but i will as soon as possible, hoping that you could suggest me how to increase gpu utilization. I have another question: is there a guide, conference, course or something else where i can learn to interprete the logs and the trace view of the profiler? This way i can try to solve my code optimization asking here as little as possible.
We are working on the guide, hopefully to be available soon. Thanks -ck
On Wed, Aug 5, 2020 at 7:22 AM piepor [email protected] wrote:
Thanks for the quick response, It was on both sides: the device and the host. Unfortunatly I can't provide logs in these days but i will as soon as possible, hoping that you could suggest me how to increase gpu utilization. I have another question: is there a guide, conference, course or something else where i can learn to interprete the logs and the trace view of the profiler? This way i can try to solve my code optimization asking here as little as possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/profiler/issues/120#issuecomment-669222983, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE33L3MG7ZB6H23MQPUQRFLR7FTMFANCNFSM4PUIWQ2Q .
As for the guide, you can look at the current one (https://www.tensorflow.org/guide/profiler#profiler_tools). We are working on a more detailed one with examples.
For your issue, what is the step-time breakdown shown on the Overview Page?
We are working on the guide, hopefully to be available soon. Thanks -ck
Ok great!

This is my full profile log profile_logs.zip I think that the problem is that my model is a really little one, so most of the time is spent in launching kernels. This is also supported by the fact that using only the CPU speeds up the training. Is there a way to effectively use the GPU? Or for this model I have to give up using only the CPU? Thank you very much
@ckluk any more advice here? I am facing the same issue - the profiler tool is great, but it is very hard to optimise the kernel launch time, any more advice or guides in this area?
@ckluk-github hello, i use cpu train the model, the idle time is 97.5% is it normal? thanks
@ckluk-github , I am repeating question from @siwang2011 , If we are using CPU for training, is idle time ~90% normal?