ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

when i finished training the cuda mem is stilled occupied , how to free the mem?

Open alpttex19 opened this issue 1 year ago • 1 comments

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

alpttex19 avatar Oct 30 '24 07:10 alpttex19

I have the same problem. The cleanup() doesn't clean the memory: https://github.com/ostris/ai-toolkit/blob/58f9d01c2bd7edfb5de0ff61dd564481705cdb89/toolkit/job.py#L44 I have added: del job.process del job gc.collect() torch.cuda.empty_cache() reducing the memory but I still have some memory leak.

Another memory problem I have is that I try to make two trainings in parallel using two gpus, but the one using cuda:1 always use some memory in cuda:0 when executing the line: https://github.com/ostris/ai-toolkit/blob/58f9d01c2bd7edfb5de0ff61dd564481705cdb89/extensions_built_in/sd_trainer/SDTrainer.py#L1635

I suspect the problem is with the bitsandbytes optimizer although I'm not sure.

maflx avatar Oct 30 '24 14:10 maflx