sd-scripts flux_minimal_inference: CUDA Out of Memory with LORA and negative

I have 24Gb VRAM. flux_minimal_inference run without any problem (with or without negative_prompt). Using trained LORA with flux_minimal_inference (without negative_prompt) also do not causing any problems.

But using flux_minimal_inference with LORA and negative_prompt together causing CUDA Out of Memory

Traceback (most recent call last):
  File "/opt/flux/sd-scripts/flux_minimal_inference.py", line 509, in <module>
    generate_image(
  File "/opt/flux/sd-scripts/flux_minimal_inference.py", line 322, in generate_image
    x = do_sample(
        ^^^^^^^^^^
  File "/opt/flux/sd-scripts/flux_minimal_inference.py", line 174, in do_sample
    x = denoise(
        ^^^^^^^^
  File "/opt/flux/sd-scripts/flux_minimal_inference.py", line 111, in denoise
    pred = model(
           ^^^^^^
  File "/opt/conda/envs/flux/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/flux/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/flux/sd-scripts/library/flux_models.py", line 1050, in forward
    img = block(img, vec=vec, pe=pe, txt_attention_mask=txt_attention_mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/flux/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/flux/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/flux/sd-scripts/library/flux_models.py", line 858, in forward
    return self._forward(x, vec, pe, txt_attention_mask)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/flux/sd-scripts/library/flux_models.py", line 833, in _forward
    attn = attention(q, k, v, pe=pe, attn_mask=attn_mask)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/flux/sd-scripts/library/flux_models.py", line 452, in attention
    x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacity of 23.87 GiB of which 63.62 MiB is free. Including non-PyTorch memory, this process has 23.81 GiB memory in use. Of the allocated memory 23.53 GiB is allocated by PyTorch, and 114.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True is set in my env

Feb 03 '25 20:02 mamawr

same problem

Jul 09 '25 10:07 ganzf886

Please add --flux_dtype fp8 --offload options to reduce the memory usage.

Jul 09 '25 13:07 kohya-ss

flux_minimal_inference: CUDA Out of Memory with LORA and negative_prompt