tfjs icon indicating copy to clipboard operation
tfjs copied to clipboard

Possible memory leak when reloading a model after disposing of it.

Open stevexbritton opened this issue 1 year ago • 8 comments

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Modified version of the @tensorflow/tfjs npm page (https://www.npmjs.com/package/@tensorflow/tfjs) example
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 16.6.1
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow.js installed from (npm or script link): script link
  • TensorFlow.js version (use command below): https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/dist/tf.js
  • Browser version: Chrome 131
  • Tensorflow.js Converter Version:

Describe the current behavior After loading and using a LayersModel I call model.dispose() and tf.disposeVariables() to release the tf memory. However, if I reload the model to use it again memory is leaked, at least 16k of Array data. This occurs each time around the loop.

Describe the expected behavior I would not expect a memory leak and would expect it to behave the same as if the model was just reused.

Standalone code to reproduce the issue The url "https://vykingsneakerkitnative.s3.eu-central-1.amazonaws.com/SteveTest/tmp/tf-leak-test.html" demonstrates the problem. Steps to demonstrate:

  1. Load the page "https://vykingsneakerkitnative.s3.eu-central-1.amazonaws.com/SteveTest/tmp/tf-leak-test.html" in chrome and open Developer Tools
  2. Click "Run Test1" button.
  3. Garbage collect and take a memory sample.
  4. Click "Run Test1" button again.
  5. Garbage collect and take another memory sample.
  6. Comparing sample 2 with sample 1 shows the "Array" objects has increased by about 16K

Steps to demonstrate model reuse with minimal memory grown

  1. Load the page in chrome and open devtools (or do a page reload)
  2. Click "Run Test2" button.
  3. Garbage collect and take a memory sample.
  4. Click "Run Test2" button again.
  5. Garbage collect and take another memory sample.
  6. Comparing sample 2 with sample 1 shows the "Array" objects has increased only by about 96

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

stevexbritton avatar Nov 25 '24 16:11 stevexbritton

Hi @shmishra99, Thank you for assigning this to yourself. Is there any more you need from me to progress this? Steve

stevexbritton avatar Dec 03 '24 10:12 stevexbritton

Hi @shmishra99, I'm confused. Are you investigating this issue or have you assigned it to yourself for some other reason?

stevexbritton avatar Dec 10 '24 17:12 stevexbritton

Hi @stevexbritton ,

Apologies for the late response.

I was testing the link you shared. For me, both runs show the same response without increasing the size of the array in Test 1.

I'm not sure about Test 2. It's giving the same number of tensors with increased tensor values.

For Test1 output:

image

For Test2 output:

image

Can you please confirm if I'm getting the same output as you are, and how this could be a case of a memory leak?

Please let me know if I'm missing anything. Thank you!

shmishra99 avatar Dec 11 '24 09:12 shmishra99

Hi, It's not the number of tensors left in the GPU that's the issue. It's the number of Javascript Array objects that are not garbage collected. This image shows the procedure for Test1 holding onto 16K Array objects. Screenshot 2024-12-13 at 13 22 03 This image shows the procedure for Test2 only holding onto 96 Array objects and the only difference is we do not dispose of and reload the model for each iteration.. Screenshot 2024-12-13 at 13 28 08

stevexbritton avatar Dec 13 '24 13:12 stevexbritton

Any further updates?

stevexbritton avatar Dec 23 '24 13:12 stevexbritton

Hi @shmishra99, would you please respond to this, even if it's to say you can't look it at the moment. I don't think just ignoring it is acceptable once you've assigned it to yourself.

stevexbritton avatar Jan 08 '25 11:01 stevexbritton

Hi @stevexbritton ,

Thank you for reaching out. I apologize for the delay in responding. I've tested the code snippet you provided, and I've noticed that the array size is increasing with each run, even after disposing of the model and tensors. Your code flow seems correct to me, but I am not sure why this is happening. I will discuss this issue internally next week and provide an update.

Here is the console snapshot after each run:

Snapshot1:

image

Snapshot2:

image

Snapshot3:

image

Thank You!!

shmishra99 avatar Jan 08 '25 16:01 shmishra99

Hi @stevexbritton ,

Thank you for bringing this issue to our attention.

After investigating with our development team, we've identified a potential minor memory leak that may occur after disposing of the model. We are actively working on a fix for this issue and will provide you with an update as soon as it is resolved.

We appreciate you taking the time to report this and for providing a minimal reproducible code sample to assist us in our investigation.

shmishra99 avatar Jan 18 '25 14:01 shmishra99