InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[enhancement]: option to unload from memory

Open aaronbolton opened this issue 1 year ago • 8 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Contact Details

What should this feature add?

a command line option to unload model from RAM after a defined period of time

Alternatives

running as a container and using Sablier to shutdown the container after some time, this has the downside of if traffic isn't see through the web interface it will be shut even if jobs are running.

Additional Content

No response

aaronbolton avatar Sep 14 '24 12:09 aaronbolton

What's the use case for this?

psychedelicious avatar Sep 15 '24 01:09 psychedelicious

It would help with freeing up the memory for other applications such as Ollama, ollama has a similar feature were it unloads the models out of memory after 5min by default but this is configurable

aaronbolton avatar Sep 15 '24 07:09 aaronbolton

Invoke already (partially) unloads models from VRAM if you set memory bounds. However, this seems to only work properly, when you let the generation stop on its own - Not if you abort the queue. I reported this already some time ago but devs seems to be busy with the huge amount of reported issues: https://github.com/invoke-ai/InvokeAI/issues/6759

systemofapwne avatar Sep 29 '24 13:09 systemofapwne

@aaronbolton you might like my new Ollama Node for help expanding prompts in Invoke.

By the way, I have also noticed that Invoke doesn't seem to free up the memory after a generation, so Ollama runs slower on subsequent generations, probably because it is offloading most processing to cpu. It would be great if Invoke had an option to totally free the memory after each gen, for use cases like this. (My node has a toggle to unload the model from Ollama after generating the expanded prompt.)

Jonseed avatar Oct 02 '24 00:10 Jonseed

I would like to second that. Ollama unloads models from VRAM after a timeout (default 5 mins). It would be nice to unload models from GPU once generation is completed and then wait for timeout and if there was no further generation then it unloads the model completely! Currently I have to just restart invoke to unload the model from VRAM

fahadshery avatar Oct 03 '24 07:10 fahadshery

You can set lazy_offload: false in the invokeai.yaml config file and the app will actively offload models.

psychedelicious avatar Oct 03 '24 08:10 psychedelicious

You can set lazy_offload: false in the invokeai.yaml config file and the app will actively offload models.

still gobbles up some VRAM when sitting idle. Is it not possible to fully unload?

fahadshery avatar Oct 03 '24 19:10 fahadshery

It only offloads down to the confused vram cache setting. There's no way at the moment to forcibly offload all models.

I think it'd be fairly straightforward to add an endpoint to do this, open to contributions.

psychedelicious avatar Oct 03 '24 19:10 psychedelicious

There is a button to clear the model cache in the queue tab

psychedelicious avatar Aug 15 '25 08:08 psychedelicious