profiler TensorFlow profiler running into OOM issue on GPU

Running TensorFlow profiler for longer than 10 second period results into OOM error, crashes the inference process and the profiler returns DEADLINE_EXCEEDED. Is there anyway to limit the sampling rate or way to reduce the amount of information being collected to avoid crashing the process?

Here is the code that I run: tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000)

Aug 10 '23 04:08 rahul-fnu

Hi Tensorflow team

Can you help us with above? Is there a way to sample TensorFlow profiling on GPUs? This is blocking us from collecting any traces greater than 10s

Aug 11 '23 18:08 ndeepesh

Have you tried to do this with keras callbacks using something like this:

tensorboard_callback = tf.keras.callbacks.TensorBoard(                                                                                                                                    
          log_dir=fn_args.model_run_dir, profile_batch= (40,80), histogram_freq=1, write_steps_per_second=True, write_graph=False)

And passing the callback within model.fit?

Aug 27 '23 10:08 pritamdodeja

@rahul-fnu To limit the sampling rate or reduce the amount of information collected by the TensorFlow profiler, you can adjust the sampling_rate parameter in the tensorflow_profiler.experimental.client function. Use- tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000, sampling_rate=0.5, events=["compute"])

Feb 07 '24 18:02 Rahulraj0308