stable-diffusion-webui [Feature Request]: Manually load/unload checkpoints into GPU

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I want to achieve the following either programmatically or via API:

List of all checkpoints and their status (loaded or unloaded)
Load a checkpoint
Unload a checkpoint

Proposed workflow

Retrieve the available checkpoints and their status via HTTP request e.g. http://0.0.0.0:7860/sdapi/v1/checkpoint-status
Load a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/load-checkpoint?checkpointid=abc123
Unload a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/unload-checkpoint?checkpointid=abc123 The checkpoint parameter passed in 2 and 3 should be obtained from 1. For example, the object returned in 1 could contain a "uniqueid" key.

Additional information

I've made a fork but it only loads and unloads the currently selected checkpoint. The relevant endpoints are unloadmodel, loadmodel and get_model_status in api.py. https://github.com/AnthoneoJ/stable-diffusion-webui

Jul 02 '24 10:07 AnthoneoJ

why?

Jul 02 '24 10:07 w-e-w

why?

In my use case, the machine is running multiple AI services (one of them being this webui). There are several machines that do the same. So the checkpoints should be loaded upon machine boot up and unloaded if memory is needed for another AI service, etc.

Jul 02 '24 10:07 AnthoneoJ

modle loading is a mess in webui I suggest you just settle with Maximum number of checkpoints loaded at the same time to 1 and Only keep one model on device True

the 2 api's endpoints to be honest works more like putting web UI to sleep and wake it up from sleep /sdapi/v1/unload-checkpoint and /sdapi/v1/reload-checkpoint

yo ucan put it to sleep and save VRAM and wake it before use

you need to manually wake it before use (bad design on our part)

there is an issue with /sdapi/v1/unload-checkpoint, if the Maximum number of checkpoints loaded at the same time is > 1, the sleep will only send the current main model to ram there's no distinguish between which models it just unloads the main model, it only cares about the main model

example if you have Maximum number of checkpoints loaded at the same time set to 3 Only keep one model on device False after switch model or more 3 times there will be 3 modles loaded now if you use /sdapi/v1/unload-checkpoint only 1 model will be unloaded, 2 will be still loaded

change model (load model) can be doen by post to /sdapi/v1/options with

{
    "sd_model_checkpoint": "YOUR model"
}

or you could use add override_settings arg in payload of txt2img / img2img api call this method is generally more reliable when dealing with multiple users

"override_settings": {
     "sd_model_checkpoint": "YOUR model"
}

you can get a list all models by using /sdapi/v1/sd-models

it should is possible to improve it but some people need to wanted enough to work on that future it might even be possible to implement this as an extension

I might trying to work on this but no guarantees

initially I was confused because I somehow misread your request as you wanting to load every model in sequence then unload them for no apparent reason

Jul 02 '24 11:07 w-e-w

these can aslo help https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings --nowebui --skip-load-model-at-start

Jul 02 '24 12:07 w-e-w

Ah, thanks! I knew bits and pieces from inspecting the codebase. This puts them all together. One more thing before I can go off on my own: how do I know whether a model is currently loaded or not? At the moment, I'm inferring this from sd_models.model_data.sd_model. If it's None, the model is unloaded, and vice versa.

Jul 02 '24 16:07 AnthoneoJ

sd_models.model_data.sd_model yeah I think that's pretty much the place you want to look

however if you also used Checkpoints to cache in RAM > 0 then I think you also want to inspect shared.opts.sd_checkpoint_cache and checkpoints_loaded

if you have improvements that you think that can benefit everyone then don't hesitate to contribute

Jul 03 '24 02:07 w-e-w

Ollama framework has a really handy environment and API accessible variable:

OLLAMA_KEEP_ALIVE=[# of seconds] | [xM] | 0

I think it's mostly used for people who want the last loaded chat model to stay loaded longer. But I use it set to zero to keep the GPU VRAM as empty as possible as soon as possible. This is because I have many users that mostly use the GPU for chat and occasionally for Text-to-speech and SD image creation - loading up the GPU VRAM. Unfortunately SDWeb keeps its last model loaded indefinitely. It would be great if SDWeb had a similar Keep Alive option to let us decide how long to keep the last model loaded.

Jul 03 '24 22:07 randelreiss