Possible VAE memory leak?
This issue was pointed out by @handshape surrounding some Python bindings I made for stable-diffusion.cpp here. It seems that when using CUBLAS or HIPBLAS, running free_sd_ctx does not release all of the VRAM. @handshape describes the issue with:
After some experimentation, I've noticed that the amount that hangs around is always almost exactly the same amount as gets allocated for the VAE, plus about 100MB. VAE tiling reduced the size of the leak, and doing the VAE phase on the CPU leaves just the 100 or so MB of leftovers. If I was going to hazard a guess, there's more being allocated in stable-diffusion.cpp's load_from_file() than is getting freed by the free_sd_ctx() call.
I'm able to recreate the same behaviour in stable-diffusion.cpp by adding a sleep at the end of the CLI code on line 896 main.cpp file inside examples\cli\main.cpp.
Line 1:
#include <iostream>
#include <thread>
#include <chrono>
Line 899 (after adding imports):
// After running these:
// free(results);
// free_sd_ctx(sd_ctx);
// free(control_image_buffer);
// free(input_image_buffer);
printf("FINISHED");
std::this_thread::sleep_for(std::chrono::seconds(15));
Even though it seems everything should have been unloaded, not all the VRAM is released until the code finally exits. Any suggestions?
My C++ is rusty, but in stable-diffusion.cpp from here: https://github.com/leejet/stable-diffusion.cpp/blob/9c51d8787f78ef1bd0ead1e8f48b766d7ee7484d/stable-diffusion.cpp#L213 to here: https://github.com/leejet/stable-diffusion.cpp/blob/9c51d8787f78ef1bd0ead1e8f48b766d7ee7484d/stable-diffusion.cpp#L274 I see a bunch of calls to std::make_shared()
This is the reference-counting stuff in the standard library, no? I seem to remember there were a bunch of funny semantics about how to tell a shared pointer that it had been dereferenced.