InvokeAI [bug]: Models constantly unload after every generation

Is there an existing issue for this problem?

[x] I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

No response

GPU VRAM

No response

Version number

5.15

Browser

Firefox

Python dependencies

No response

What happened

Models are immediately removed from ram after every generation. This includes during batch generations or performing multiple iterations. So if I have a queue of 10 images to be generated the models are loaded and unloaded 10 times. I have tried setting lazy_offload to true with no effect. As far as I can tell there is no setting or option to stop this from happening.

What you expected to happen

I would expect the model to stay in memory for at least a short time. Or at the very least during a batch run of images. It slows down the process of generating multiple images. It should also be a feature that is on by default.

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

Jul 12 '25 04:07 abes200

What is your VRAM/RAM cache set at?

Jul 12 '25 04:07 hipsterusername

What is your VRAM/RAM cache set at?

And where exactly do I find those details in InvokeAI? I cant find it in the settings page. The config file I am using is the default one that came with a fresh install.

Jul 12 '25 06:07 abes200

I have noticed if I use a model that fits entirely in VRAM 16gb then it does not reload. If it uses any system ram at all it will immediately unload once image generation is finished, regardless if there are more images in the queue.

For some reason the installer did install version 5.15. I have just updated it to 6.0.1. However the behavior is the same.

Jul 12 '25 06:07 abes200

And where exactly do I find those details in InvokeAI? I cant find it in the settings page. The config file I am using is the default one that came with a fresh install.

Click the gear icon @ bottom left -> About -> copy the system info

Jul 14 '25 07:07 psychedelicious

Gear icon -> About -> no system info. There is a open section with what looks like a json file in it. It has "version" "dependencies" and "config". Beside that is some details about the program itself. I am going to assume you mean you want the config section from that json section in the about? After looking through it I have noticed it has set vram and ram both to "null" as well as the max_cache_ram_gb and max_cache_vram_gb are both also set to null. Here is a paste of the config. "config": { "schema_version": "4.0.2", "legacy_models_yaml_path": null, "host": "127.0.0.1", "port": 9091, "allow_origins": [], "allow_credentials": true, "allow_methods": [""], "allow_headers": [""], "ssl_certfile": null, "ssl_keyfile": null, "log_tokenization": false, "patchmatch": true, "models_dir": "models", "convert_cache_dir": "models\.convert_cache", "download_cache_dir": "models\.download_cache", "legacy_conf_dir": "configs", "db_dir": "databases", "outputs_dir": "outputs", "custom_nodes_dir": "nodes", "style_presets_dir": "style_presets", "workflow_thumbnails_dir": "workflow_thumbnails", "log_handlers": ["console"], "log_format": "color", "log_level": "info", "log_sql": false, "log_level_network": "warning", "use_memory_db": false, "dev_reload": false, "profile_graphs": false, "profile_prefix": null, "profiles_dir": "profiles", "max_cache_ram_gb": null, "max_cache_vram_gb": null, "log_memory_usage": false, "device_working_mem_gb": 3, "enable_partial_loading": false, "keep_ram_copy_of_weights": true, "ram": null, "vram": null, "lazy_offload": true, "pytorch_cuda_alloc_conf": null, "device": "auto", "precision": "auto", "sequential_guidance": false, "attention_type": "auto", "attention_slice_size": "auto", "force_tiled_decode": false, "pil_compress_level": 1, "max_queue_size": 10000, "clear_queue_on_startup": false, "allow_nodes": null, "deny_nodes": null, "node_cache_size": 512, "hashing_algorithm": "blake3_single", "remote_api_tokens": null, "scan_models_on_startup": false },

Jul 15 '25 12:07 abes200

Jul 15 '25 21:07 abes200

This happens to my setup with a 5090 and 128Gb RAM as well.

FLUX Dev just keeps reloading every model every time. Other models also get reloaded all the time. Seeing as the creator of the issue did not provide that many info, let me send you more details.

I've seen a warning like that in log, so this might be the culprit:

WARNING --> [MODEL CACHE] Failed to calculate model size for unexpected model type: <class 'transformers.tokenization_utils_fast.PreTrainedTokenizerFast'>. The model will be treated as having size 0.

VRAM:

RAM:

Startup log:

Starting the InvokeAI browser-based UI..
[InvokeAI]::INFO --> Using torch device: NVIDIA GeForce RTX 5090
[InvokeAI]::INFO --> cuDNN version: 90701
[InvokeAI]::INFO --> Patchmatch initialized
[InvokeAI]::INFO --> InvokeAI version 6.8.0
[InvokeAI]::INFO --> Root directory = C:\invokeai
[InvokeAI]::INFO --> Initializing database at C:\invokeai\databases\invokeai.db
[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 29534.56 MB. Heuristics applied: [1, 2].
[InvokeAI]::INFO --> Executing queue item 58105, session 194796df-fa5d-44b7-ba4c-21d4916e253c
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 26.13it/s]
[InvokeAI]::INFO --> Cleaned database (freed 1.17MB)
[InvokeAI]::INFO --> Invoke running on http://0.0.0.0:9090 (Press CTRL+C to quit)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:text_encoder' (GlmModel) onto cuda device in 42.10s. Total model size: 16744.98MB, VRAM: 16744.98MB (100.0%)
[ModelManagerService]::WARNING --> [MODEL CACHE] Failed to calculate model size for unexpected model type: <class 'transformers.tokenization_utils_fast.PreTrainedTokenizerFast'>. The model will be treated as having size 0.
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:tokenizer' (PreTrainedTokenizerFast) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:text_encoder' (GlmModel) onto cuda device in 0.00s. Total model size: 16744.98MB, VRAM: 16744.98MB (100.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:tokenizer' (PreTrainedTokenizerFast) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 28.58it/s]
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:transformer' (CogView4Transformer2DModel) onto cuda device in 30.14s. Total model size: 12148.13MB, VRAM: 12148.13MB (100.0%)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:26<00:00,  1.14it/s]
estimate_vae_working_memory_cogview4: 4613734400
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:vae' (AutoencoderKL) onto cuda device in 1.90s. Total model size: 774.58MB, VRAM: 774.58MB (100.0%)
[InvokeAI]::INFO --> Graph stats: 194796df-fa5d-44b7-ba4c-21d4916e253c
                          Node   Calls   Seconds  VRAM Used
         cogview4_model_loader       1    0.004s     0.000G
         cogview4_text_encoder       2   43.973s    16.444G
                        string       1    0.001s    16.400G
                       integer       1    0.001s    16.400G
              cogview4_denoise       1   56.637s    16.402G
                 core_metadata       1    0.001s    11.872G
                  cogview4_l2i       1    4.258s    17.385G
TOTAL GRAPH EXECUTION TIME: 104.875s
TOTAL GRAPH WALL TIME: 104.878s
RAM used by InvokeAI process: 14.63G (+13.769G)
RAM used to load models: 28.97G
VRAM in use: 12.630G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 4
   Models cached: 3
   Models cleared from cache: 1
   Cache high water mark: 28.22/0.00G

[InvokeAI]::INFO --> Executing queue item 58106, session e7b59abd-bd2a-4692-9cfe-87fcbf21b074
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 92.61it/s]
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:text_encoder' (GlmModel) onto cuda device in 5.73s. Total model size: 16744.98MB, VRAM: 16744.98MB (100.0%)
[ModelManagerService]::WARNING --> [MODEL CACHE] Failed to calculate model size for unexpected model type: <class 'transformers.tokenization_utils_fast.PreTrainedTokenizerFast'>. The model will be treated as having size 0.
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:tokenizer' (PreTrainedTokenizerFast) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 129.44it/s]
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:transformer' (CogView4Transformer2DModel) onto cuda device in 4.09s. Total model size: 12148.13MB, VRAM: 12148.13MB (100.0%)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:24<00:00,  1.24it/s]
estimate_vae_working_memory_cogview4: 4613734400
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model 'c9b3f5a7-fad2-4707-b0fb-e0fdc53f8859:vae' (AutoencoderKL) onto cuda device in 0.26s. Total model size: 774.58MB, VRAM: 774.58MB (100.0%)
[InvokeAI]::INFO --> Graph stats: e7b59abd-bd2a-4692-9cfe-87fcbf21b074
                          Node   Calls   Seconds  VRAM Used
         cogview4_model_loader       1    0.001s    12.630G
         cogview4_text_encoder       2    7.673s    17.213G
                        string       1    0.000s    12.630G
                       integer       1    0.000s    17.161G
              cogview4_denoise       1   28.656s    17.161G
                 core_metadata       1    0.001s    11.873G
                  cogview4_l2i       1    2.267s    17.386G
TOTAL GRAPH EXECUTION TIME:  38.598s
TOTAL GRAPH WALL TIME:  38.600s
RAM used by InvokeAI process: 14.66G (+0.030G)
RAM used to load models: 28.97G
VRAM in use: 12.631G
RAM cache statistics:
   Model cache hits: 4
   Model cache misses: 4
   Models cached: 3
   Models cleared from cache: 1
   Cache high water mark: 28.22/0.00G

One more generation log, just in case:

[InvokeAI]::INFO --> Executing queue item 58207, session 614452a8-ddee-4463-9328-a74e8899279c
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 85.05it/s]
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '9299a25a-4111-489e-97f8-fcfd098ef0b1:text_encoder_2' (T5EncoderModel) onto cuda device in 3.14s. Total model size: 9083.39MB, VRAM: 9083.39MB (100.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '9299a25a-4111-489e-97f8-fcfd098ef0b1:tokenizer_2' (T5TokenizerFast) onto cuda device in 0.00s. Total model size: 0.03MB, VRAM: 0.00MB (0.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '3bad88b6-c43a-468d-907c-2ebf6b870366:text_encoder' (CLIPTextModel) onto cuda device in 0.06s. Total model size: 469.44MB, VRAM: 469.44MB (100.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '3bad88b6-c43a-468d-907c-2ebf6b870366:tokenizer' (CLIPTokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '8a48478d-8209-4755-80e8-212be678a68e:transformer' (Flux) onto cuda device in 7.96s. Total model size: 22700.13MB, VRAM: 22700.13MB (100.0%)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:15<00:00,  1.93it/s]
estimate_vae_working_memory_flux: 4613734400
[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '1b3e7358-9440-4f9b-8d15-462f1636fc1c:vae' (AutoEncoder) onto cuda device in 0.03s. Total model size: 159.87MB, VRAM: 159.87MB (100.0%)
[InvokeAI]::INFO --> Graph stats: 614452a8-ddee-4463-9328-a74e8899279c
                          Node   Calls   Seconds  VRAM Used
             flux_model_loader       1    0.000s    22.887G
                        string       1    0.001s    22.887G
             flux_text_encoder       1    5.760s    22.887G
                       collect       1    0.001s     9.590G
                       integer       1    0.001s     9.590G
                  flux_denoise       1   24.588s    23.298G
                 core_metadata       1    0.001s    22.722G
               flux_vae_decode       1    0.428s    25.015G
TOTAL GRAPH EXECUTION TIME:  30.780s
TOTAL GRAPH WALL TIME:  30.781s
RAM used by InvokeAI process: 25.21G (-0.000G)
RAM used to load models: 31.65G
VRAM in use: 22.887G
RAM cache statistics:
   Model cache hits: 6
   Model cache misses: 6
   Models cached: 5
   Models cleared from cache: 2
   Cache high water mark: 22.78/0.00G

Info:

{
    "version": "6.8.0",
    "dependencies": {
        "absl-py"              : "2.3.1",
        "accelerate"           : "1.10.1",
        "annotated-types"      : "0.7.0",
        "anyio"                : "4.11.0",
        "attrs"                : "25.4.0",
        "bidict"               : "0.23.1",
        "bitsandbytes"         : "0.48.1",
        "blake3"               : "1.0.7",
        "certifi"              : "2022.12.7",
        "cffi"                 : "2.0.0",
        "charset-normalizer"   : "2.1.1",
        "click"                : "8.3.0",
        "colorama"             : "0.4.6",
        "coloredlogs"          : "15.0.1",
        "compel"               : "2.1.1",
        "contourpy"            : "1.3.3",
        "CUDA"                 : "12.8",
        "cycler"               : "0.12.1",
        "Deprecated"           : "1.2.18",
        "diffusers"            : "0.33.0",
        "dnspython"            : "2.8.0",
        "dynamicprompts"       : "0.31.0",
        "einops"               : "0.8.1",
        "fastapi"              : "0.118.2",
        "fastapi-events"       : "0.12.2",
        "filelock"             : "3.13.1",
        "flatbuffers"          : "25.9.23",
        "fonttools"            : "4.60.1",
        "fsspec"               : "2024.6.1",
        "gguf"                 : "0.17.1",
        "h11"                  : "0.16.0",
        "httptools"            : "0.6.4",
        "huggingface-hub"      : "0.35.3",
        "humanfriendly"        : "10.0",
        "idna"                 : "3.4",
        "importlib_metadata"   : "7.1.0",
        "InvokeAI"             : "6.8.0",
        "jax"                  : "0.7.1",
        "jaxlib"               : "0.7.1",
        "Jinja2"               : "3.1.4",
        "kiwisolver"           : "1.4.9",
        "MarkupSafe"           : "2.1.5",
        "matplotlib"           : "3.10.7",
        "mediapipe"            : "0.10.14",
        "ml_dtypes"            : "0.5.3",
        "mpmath"               : "1.3.0",
        "networkx"             : "3.3",
        "numpy"                : "1.26.3",
        "onnx"                 : "1.16.1",
        "onnxruntime"          : "1.19.2",
        "opencv-contrib-python": "4.11.0.86",
        "opt_einsum"           : "3.4.0",
        "packaging"            : "24.1",
        "picklescan"           : "0.0.31",
        "pillow"               : "11.0.0",
        "prompt_toolkit"       : "3.0.52",
        "protobuf"             : "4.25.8",
        "psutil"               : "7.1.0",
        "pycparser"            : "2.23",
        "pydantic"             : "2.11.10",
        "pydantic-settings"    : "2.11.0",
        "pydantic_core"        : "2.33.2",
        "pyparsing"            : "3.2.5",
        "PyPatchMatch"         : "1.0.2",
        "pyreadline3"          : "3.5.4",
        "python-dateutil"      : "2.9.0.post0",
        "python-dotenv"        : "1.1.1",
        "python-engineio"      : "4.12.3",
        "python-multipart"     : "0.0.20",
        "python-socketio"      : "5.14.1",
        "PyWavelets"           : "1.9.0",
        "PyYAML"               : "6.0.3",
        "regex"                : "2025.9.18",
        "requests"             : "2.28.1",
        "safetensors"          : "0.6.2",
        "scipy"                : "1.16.2",
        "semver"               : "3.0.4",
        "sentencepiece"        : "0.2.0",
        "setuptools"           : "70.2.0",
        "simple-websocket"     : "1.1.0",
        "six"                  : "1.17.0",
        "sniffio"              : "1.3.1",
        "sounddevice"          : "0.5.2",
        "spandrel"             : "0.4.1",
        "starlette"            : "0.48.0",
        "sympy"                : "1.13.3",
        "tokenizers"           : "0.22.1",
        "torch"                : "2.7.1+cu128",
        "torchsde"             : "0.2.6",
        "torchvision"          : "0.22.1+cu128",
        "tqdm"                 : "4.66.5",
        "trampoline"           : "0.1.2",
        "transformers"         : "4.57.0",
        "typing-inspection"    : "0.4.2",
        "typing_extensions"    : "4.12.2",
        "urllib3"              : "1.26.13",
        "uvicorn"              : "0.37.0",
        "watchfiles"           : "1.1.0",
        "wcwidth"              : "0.2.14",
        "websockets"           : "15.0.1",
        "wrapt"                : "1.17.3",
        "wsproto"              : "1.2.0",
        "zipp"                 : "3.19.2"
    },
    "config": {
        "schema_version": "4.0.2",
        "legacy_models_yaml_path": null,
        "host": "0.0.0.0",
        "port": 9090,
        "allow_origins": [],
        "allow_credentials": true,
        "allow_methods": ["*"],
        "allow_headers": ["*"],
        "ssl_certfile": null,
        "ssl_keyfile": null,
        "log_tokenization": false,
        "patchmatch": true,
        "models_dir": "models",
        "convert_cache_dir": "models\\.convert_cache",
        "download_cache_dir": "models\\.download_cache",
        "legacy_conf_dir": "configs",
        "db_dir": "databases",
        "outputs_dir": "C:\\invokeai\\outputs",
        "custom_nodes_dir": "nodes",
        "style_presets_dir": "style_presets",
        "workflow_thumbnails_dir": "workflow_thumbnails",
        "log_handlers": ["console"],
        "log_format": "color",
        "log_level": "info",
        "log_sql": false,
        "log_level_network": "warning",
        "use_memory_db": false,
        "dev_reload": false,
        "profile_graphs": false,
        "profile_prefix": null,
        "profiles_dir": "profiles",
        "max_cache_ram_gb": null,
        "max_cache_vram_gb": null,
        "log_memory_usage": false,
        "device_working_mem_gb": 3,
        "enable_partial_loading": false,
        "keep_ram_copy_of_weights": true,
        "ram": 64,
        "vram": null,
        "lazy_offload": true,
        "pytorch_cuda_alloc_conf": null,
        "device": "auto",
        "precision": "auto",
        "sequential_guidance": false,
        "attention_type": "auto",
        "attention_slice_size": "auto",
        "force_tiled_decode": false,
        "pil_compress_level": 1,
        "max_queue_size": 10000,
        "clear_queue_on_startup": false,
        "allow_nodes": null,
        "deny_nodes": null,
        "node_cache_size": 512,
        "hashing_algorithm": "blake3_single",
        "remote_api_tokens": null,
        "scan_models_on_startup": false,
        "unsafe_disable_picklescan": false
    },
    "set_config_fields": ["legacy_models_yaml_path", "host", "ram", "outputs_dir"]
}

The script I use to start InvokeAI:

@echo off

PUSHD "%~dp0"
setlocal

call .venv\Scripts\activate.bat
set INVOKEAI_ROOT=.

:start
echo Desired action:
echo 1. Generate images with the browser-based interface
echo 2. Open the developer console
echo 3. Command-line help
echo Q - Quit
echo.
echo To update, download and run the installer from https://github.com/invoke-ai/InvokeAI/releases/latest
echo.
set /P choice="Please enter 1-4, Q: [1] "
if not defined choice set choice=1
IF /I "%choice%" == "1" (
    echo Starting the InvokeAI browser-based UI..
    python .venv\Scripts\invokeai-web.exe %*
) ELSE IF /I "%choice%" == "2" (
    echo Developer Console
    echo Python command is:
    where python
    echo Python version is:
    python --version
    echo *************************
    echo You are now in the system shell, with the local InvokeAI Python virtual environment activated,
    echo so that you can troubleshoot this InvokeAI installation as necessary.
    echo *************************
    echo *** Type `exit` to quit this shell and deactivate the Python virtual environment ***
    call cmd /k
) ELSE IF /I "%choice%" == "3" (
    echo Displaying command line help...
    python .venv\Scripts\invokeai-web.exe --help %*
    pause
    exit /b
) ELSE IF /I "%choice%" == "q" (
    echo Goodbye!
    goto ending
) ELSE (
    echo Invalid selection
    pause
    exit /b
)
goto start

endlocal
pause

:ending
exit /b

ComfyUI works fine, with no spikes like these.

I'd be more than happy to provide whatever info is needed to fix this issue, or test some workarounds.

Oct 10 '25 06:10 makemefeelgr8

I tried adding this to my config:

ram: 64.0
vram: 31.0
max_cache_ram_gb: 64.0
max_cache_vram_gb: 31.0

And it solved the issue with models being constantly reloaded in RAM. Nice and flat:

But the VRAM usage graph still looks the same. I guess, I might not have enough VRAM after all. Even though it says VRAM in use: 22.416G, and I've got like 10Gb more.

Oct 10 '25 07:10 makemefeelgr8