(sd-webui-comfyui) inference with checkpoint/vae/clip using models located in a different process
Hi, I am a maintainer of the sd-webui-comfyui extension for a1111.
The extension provides a comfyui node that reuses the webui checkpoint already loaded in memory to allow people to reuse resources: https://github.com/ModelSurge/sd-webui-comfyui/wiki/Webui-Nodes#webui-checkpoint
The way it has been implemented is very fragile and breaks whenever the implementation of the Model/VAE/Clip (along with ModelPatcher) classes change. It would be very convenient if it was possible for external code to be written against a more stable interface to provide custom checkpoint sources.
Using the checkpoint/vae/clip classes as-is is not really an option since their constructors allocate memory and rely on the assumption that an actual model reference will be available in the current process.
Particularly, the current code uses .state_dict() in numerous places, which makes it impractical to hold any model in a different process. This is because, since the webui model is in a different process than comfyui, we would need to make a copy of the model every time to move it to the comfyui process.
In the case of our a1111 extension, an alternative solution we found to be a good tradeoff is to keep the webui model in the webui process and instead copy the latents back and forth between the webui and comfyui processes. This is done by moving the latents to cpu, then serializing them, then copying the serialized tensor over to the other process, then deserializing, then moving to the right device for inference.
This is how we implemented the unet/vae/clip webui proxy classes currently. As you can see, it is a very tedious process: https://github.com/ModelSurge/sd-webui-comfyui/blob/ebed0c432c8264b164b78da6e887d253e9a7aa6c/lib_comfyui/webui/proxies.py
Also, it is very fragile: https://github.com/ModelSurge/sd-webui-comfyui/issues/195
TL;DR: We need a way to infer outputs with the unet/vae/text encoder classes using models loaded in a different process than comfyui.
You could create an empty comfyui model and set the weights to point to the one from the a1111 model.
Can this work if the weights are in a different process? IIUC this does not really work. Either the latents have to be copied back and forth between the webui/comfyui processes, or the a1111 model has to be copied to the comfyui process for this to work. The former results in unstable code (as can be seen by the current implementation) and the latter is very slow and takes up memory unnecessarily.
It's possible to share memory between processes but I don't know how it would work with pytorch.
A while ago I tried many things using the torch.multiprocessing module but couldn't get it to work after a long series of trial and error. If it is possible, it must be a very very undocumented and non-discussed niche feature. When I was looking for a checkpoint sharing solution, I looked online for what other people had tried, asked GPT4 for ideas, tried many alternatives, etc. to no avail.
But with the latest versions of sd-webui-comfyui, it's now not really possible to share memory even using torch.multiprocessing, since the comfyui process is created using subprocess instead of multiprocessing. The reason we switched to subprocess is that some custom comfyui nodes people use have requirements that are incompatible with the webui venv, so we have to be able to start comfyui using its own venv, and for that we need a completely different process, which cannot share memory during the child process creation as far as I know.
For this, we had to implement a custom memory sharing mechanism, which relies on multiprocessing.shared_memory, or a temporary file, depending on user's preference.
Next year, I can try to find a way to modify the classes to make copying the latents easier to implement in the extension and make it more stable, if you're not completely against the idea.
To be clear, by "next year", I meant this week.
I really hope this is taken more seriously. There are so many things auto1111 can do easily that ComfyUI can't, and vice versa - so having the two work together is imperative imo.