profiler icon indicating copy to clipboard operation
profiler copied to clipboard

TF.timeline JSON 'ts' field uses relative time in TF 2.x, whereas it used absolute time in TF 1.x

Open crlishka opened this issue 5 years ago • 4 comments

With TF 1.x, we were able to convert the tensorflow.timeline JSON file to a form (CSV) that was usable in VTune, as documented in: https://software.intel.com/content/www/us/en/develop/articles/profiling-tensorflow-workloads-with-intel-vtune-amplifier.html. This was possible because the TF 1.x JSON format used an absolute time from the host server for the 'ts' (start time) field, as seen in this example traceEvents entry: { "ph": "X", "cat": "Op", "name": "_MklToTf", "pid": 3, "tid": 0, "ts": 1607035402039614, "dur": 3, "args": { "name": "Mkl2Tf/_51", "op": "_MklToTf", "input0": "adam_optimizer/gradients/conv2/add_grad/Reshape_1", "input1": "adam_optimizer/gradients/conv2/add_grad/Reshape_1:1" } }

With TF Profiler 2.x, it looks like the 'ts' field has become a relative time (relative to the start of the model run), at least when running on CPU: {'pid': 49, 'tid': 3623876352, 'ts': 52335.771, 'name': 'gradient_tape/sequential/dense_1/MatMul:_MklMatMul', 'ph': 'X', 'dur': 5.139, 'args': {'group_id': '4'}}

Questions:

  • Was this change in the 'ts' format, from absolute (in TF 1.x) to relative (in TF 2.x) intentional?
  • If it was intentional, where in the TF profiling data can we get the absolute start time (i.e. the host server's time) of the profile run, to add to the relative time in the 'ts' field?

Thanks for your help! -- ChrisL

crlishka avatar Dec 04 '20 00:12 crlishka

Yes, using relative time is the intended behavior in TF 2.x profiler. Note that the trace analysis in TF 2.x profiler is completely different from the TF 1.x profiler, so we won't expect the behavior to be the same.

To get the absolute time, I think you can try hacking this line: https://github.com/tensorflow/tensorflow/blob/5855c1c8ae2d2b36bf9d6f906b717046bb4afb36/tensorflow/core/profiler/convert/post_process_single_host_xplane.cc#L48

qiuminxu avatar Dec 22 '20 00:12 qiuminxu

This refactor changes the semantic https://github.com/tensorflow/tensorflow/commit/701c1e97317ace357621b7f0bd4a2e427f16ed42#r144411330

372046933 avatar Jul 19 '24 06:07 372046933

@372046933 What are the noticeable effects of this semantic change? It looks like that change was made over 2 years ago, so I'm wanting to verify if this is causing any issues for you. From my understanding of the code, it looks like this may cause traces to not be aligned to 0, but I haven't personally seen this behavior.

Matt-Hurd avatar Dec 19 '24 21:12 Matt-Hurd

@372046933 What are the noticeable effects of this semantic change? It looks like that change was made over 2 years ago, so I'm wanting to verify if this is causing any issues for you. From my understanding of the code, it looks like this may cause traces to not be aligned to 0, but I haven't personally seen this behavior.

I tensorflow Session API, We use the following pattern to collect timeline json.

run_options = tf.compat.v1.RunOptions(trace_level=tf.compat.v1.RunOptions.FULL_TRACE)
run_metadata = tf.compat.v1.RunMetadata()
sess.run(foo_bar, options=run_options, run_metadata=run_metadata)
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format()

Above generated json cannot be displayed in Chrome after the semantic change. We patched tensorflow to resolve the issue.

372046933 avatar Dec 20 '24 03:12 372046933