LocalAI Unable to generate sd3 medium images with 24gb gpu

LocalAI version: localai/localai:v2.17.1-cublas-cuda12

Environment, CPU architecture, OS, and Version: Linux sphinx 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux RTX 4090

Describe the bug When trying to generate a 512x512 or 256x256 image it runs the 24gb gpu out of memory

To Reproduce curl http://localhost:8080/v1/images/generations
-H "Content-Type: application/json"
-d '{ "prompt": "a test image", "model": "stable-diffusion-3-medium", "size": "256x256" }'

Expected behavior generating an image

Logs 10:29PM ERR Server error error="could not load model (no success): Unexpected err=RuntimeError('CUDA error: out of memory\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n'), type(err)=<class 'RuntimeError'>"

Jul 12 '24 22:07 gamerscomplete

me too, I'm using 32GB v100, does it need that much memory?

Aug 03 '24 01:08 ER-EPR

I have the same problem on a 24GB Tesla P40.

Nov 01 '24 11:11 brknkfr

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 01 '25 02:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Aug 07 '25 02:08 github-actions[bot]