when i finished training the cuda mem is stilled occupied , how to free the mem?
This is for bugs only
Did you already ask in the discord?
Yes/No
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes/No
Describe the bug
I have the same problem. The cleanup() doesn't clean the memory:
https://github.com/ostris/ai-toolkit/blob/58f9d01c2bd7edfb5de0ff61dd564481705cdb89/toolkit/job.py#L44
I have added:
del job.process
del job
gc.collect()
torch.cuda.empty_cache()
reducing the memory but I still have some memory leak.
Another memory problem I have is that I try to make two trainings in parallel using two gpus, but the one using cuda:1 always use some memory in cuda:0 when executing the line: https://github.com/ostris/ai-toolkit/blob/58f9d01c2bd7edfb5de0ff61dd564481705cdb89/extensions_built_in/sd_trainer/SDTrainer.py#L1635
I suspect the problem is with the bitsandbytes optimizer although I'm not sure.