TensorFlow profiler running into OOM issue on GPU
Running TensorFlow profiler for longer than 10 second period results into OOM error, crashes the inference process and the profiler returns DEADLINE_EXCEEDED. Is there anyway to limit the sampling rate or way to reduce the amount of information being collected to avoid crashing the process?
Here is the code that I run:
tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000)
Hi Tensorflow team
Can you help us with above? Is there a way to sample TensorFlow profiling on GPUs? This is blocking us from collecting any traces greater than 10s
Have you tried to do this with keras callbacks using something like this:
tensorboard_callback = tf.keras.callbacks.TensorBoard(
log_dir=fn_args.model_run_dir, profile_batch= (40,80), histogram_freq=1, write_steps_per_second=True, write_graph=False)
And passing the callback within model.fit?
@rahul-fnu To limit the sampling rate or reduce the amount of information collected by the TensorFlow profiler, you can adjust the sampling_rate parameter in the tensorflow_profiler.experimental.client function. Use- tensorflow_profiler.experimental.client("grpc://localhost:3222", "profiles", 30000, sampling_rate=0.5, events=["compute"])