fetch icon indicating copy to clipboard operation
fetch copied to clipboard

GPU memory not being released

Open ftorres-ucf opened this issue 4 years ago • 2 comments

Describe the bug A clear and concise description of what the bug is.

When predict.py is run, sometimes it will release the gpu memory allocated and sometimes it wont release the memory from the GPU. This causes some problems when trying to run multiple models in serial execution.

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

This error is not easily reproducible, and I also do not know if the files you guys have to work with can reproduce the error.

The error seems to happen the most when trying to run model e with the gpu specified. As in: predict.py -g 0 -c Temp_Data/Candidates/ -m e

Expected behavior A clear and concise description of what you expected to happen.

The expected behavior is for the program to run and then to close.

Screenshots If applicable, add screenshots to help explain your problem.

Screenshot_20210830_144805 It gets stuck on the las line of the command line.

Additional context Add any other context about the problem here.

I always like to find solutions for these problems before posting any bug. My best guess as to whats happening is that the Tensorflow module does not release the memory allocated in the get_model call in line 79.

By searching for solutions online, i found this: https://stackoverflow.com/questions/39758094/clearing-tensorflow-gpu-memory-after-model-execution

Therefore, by splitting the program into two different processes, then asking them to join, it should release the GPU memory. I have not yet tried to implement this, however, I could possibly do a pull request after trying it and fixing it (if it is the right solution).

ftorres-ucf avatar Aug 30 '21 19:08 ftorres-ucf

Specifically, this is for the tf2 branch

ftorres-ucf avatar Aug 30 '21 19:08 ftorres-ucf

I have seen this happen sometimes. Ideally, we want to stop using the data generator and use tf.data. I will try to work on it this weekend and push it.

devanshkv avatar Sep 01 '21 02:09 devanshkv