LocalAI Flux.1: Access to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it.

LocalAI version: 2.28.0

Environment, CPU architecture, OS, and Version: Docker OS: Ubuntu 24.10 CPU: AMD Ryzen 7 9800X3D GPU: RTX 5090

Describe the bug Using the flux.1-shchnell models to generate an image from the UI throw an error:

failed to load model with internal loader: could not load model (no success): Unexpected err=GatedRepoError('401 Client Error. (Request ID: Root=1-67ff6f8d-71d5575729dabf4c417800f6;0a8a3758-810b-4bb2-8368-6b681bc4a6bb)\n\nCannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/model_index.json.\nAccess to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in.'), type(err)=

The model configuration on the gallery is not usable due to https://huggingface.co/black-forest-labs/FLUX.1-schnell is not downloadable without authentication... We need to update the configuration with a mirror.

To Reproduce

Download the Flux.-schnell image generation model
Try to generate an image with it

Expected behavior

no error

Logs

8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.0 is exactly one major version older than the runtime version 6.30.2 at backend.proto. Please update the gencode to avoid compatibility violations in the next runtime release. 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr warnings.warn( 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr warnings.warn( 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Server started. Listening on: 127.0.0.1:39481 8:51AM DBG GRPC Service Ready 8:51AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc00034ce58} sizeCache:0 unknownFields:[] Model:black-forest-labs/FLUX.1-schnell ContextSize:1024 Seed:358204100 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:true Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/black-forest-labs/FLUX.1-schnell Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType:FluxPipeline SchedulerType: CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]} 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Loading model black-forest-labs/FLUX.1-schnell... 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Request Model: "black-forest-labs/FLUX.1-schnell" 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ContextSize: 1024 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Seed: 358204100 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr NBatch: 512 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr F16Memory: true 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr MMap: true 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr LowVRAM: true 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr NGPULayers: 99999999 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr Threads: 8 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ModelFile: "/build/models/black-forest-labs/FLUX.1-schnell" 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr PipelineType: "FluxPipeline" 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr CUDA: true 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr ModelPath: "/build/models" 8:51AM DBG GRPC(flux.1-schnell-127.0.0.1:39481): stderr 8:51AM ERR Server error error="failed to load model with internal loader: could not load model (no success): Unexpected err=GatedRepoError('401 Client Error. (Request ID: Root=1-67ff6f8d-71d5575729dabf4c417800f6;0a8a3758-810b-4bb2-8368-6b681bc4a6bb)\n\nCannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/model_index.json.\nAccess to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in.'), type(err)=<class 'huggingface_hub.errors.GatedRepoError'>"

Additional context I use the docker image localai/localai:latest-aio-gpu-nvidia-cuda-12

Apr 16 '25 08:04 SuperPat45

I tested flux.1dev-abliteratedv2

DBG context local model name not found, setting to default defaultModelName=stablediffusion

failed starting/connecting to the gRPC service error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:33993: connect: connection refused\""

3:55AM DBG GRPC Service NOT ready

3:55AM ERR Server error error="failed to load model with internal loader: grpc service not ready

This wasn't in the image generation?

Apr 22 '25 04:04 Hello-World-Traveler

With abliteratedv2, throw error:

Unexpected err=OutOfMemoryError('CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 31.36 GiB of which 19.62 MiB is free. Including non-PyTorch memory, this process has 31.33 GiB memory in use. Of the allocated memory 30.84 GiB is allocated by PyTorch, and 6.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)'), type(err)=

In the logs, this error appear:

NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90. If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

It seem fixed in nightly, we need to wait for the next PyTorch release: https://discuss.pytorch.org/t/pytorch-support-for-sm120/216099

7:55AM DBG context local model name not found, setting to default defaultModelName=stablediffusion 7:55AM DBG Parameter Config: &{PredictionOptions:{BasicModelRequest:{Model:SicariusSicariiStuff/flux.1dev-abliteratedv2} Language: Translate:false N:0 TopP:0xc003f39b30 TopK:0xc003f39b38 Temperature:0xc003f39b40 Maxtokens:0xc003f39b70 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc003f39b68 TypicalP:0xc003f39b60 Seed:0xc003f39b88 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:flux.1dev-abliteratedv2 F16:0xc003f39b19 Threads:0xc003f39b20 Debug:0xc003637960 Roles:map[] Embeddings:0xc003f39b81 Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter: Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_IMAGE FLAG_ANY] KnownUsecases: PromptStrings:[dogs] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc003f39b58 MirostatTAU:0xc003f39b50 Mirostat:0xc003f39b48 NGPULayers:0xc003f39b78 MMap:0xc003f39b80 MMlock:0xc003f39b81 LowVRAM:0xc003f39b1a Grammar: StopWords:[] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc003637968 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:FluxPipeline SchedulerType: EnableParameters:num_inference_steps IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]} 7:55AM INF BackendLoader starting backend=diffusers modelID=flux.1dev-abliteratedv2 o.model=SicariusSicariiStuff/flux.1dev-abliteratedv2 7:55AM DBG Loading model in memory from file: /build/models/SicariusSicariiStuff/flux.1dev-abliteratedv2 7:55AM DBG Loading Model flux.1dev-abliteratedv2 with gRPC (file: /build/models/SicariusSicariiStuff/flux.1dev-abliteratedv2) (backend: diffusers): {backendString:diffusers model:SicariusSicariiStuff/flux.1dev-abliteratedv2 modelID:flux.1dev-abliteratedv2 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0005e5208 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama2:/build/backend/python/exllama2/run.sh faster-whisper:/build/backend/python/faster-whisper/run.sh kokoro:/build/backend/python/kokoro/run.sh rerankers:/build/backend/python/rerankers/run.sh transformers:/build/backend/python/transformers/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 parallelRequests:false} 7:55AM DBG Loading external backend: /build/backend/python/diffusers/run.sh 7:55AM DBG external backend is file: &{name:run.sh size:73 mode:448 modTime:{wall:0 ext:63880383481 loc:0x5a6d4da0} sys:{Dev:68 Ino:30821868 Nlink:1 Mode:33216 Uid:0 Gid:0 X__pad0:0 Rdev:0 Size:73 Blksize:4096 Blocks:8 Atim:{Sec:1745308434 Nsec:336324098} Mtim:{Sec:1744786681 Nsec:0} Ctim:{Sec:1745308434 Nsec:335324097} X__unused:[0 0 0]}} 7:55AM DBG Loading GRPC Process: /build/backend/python/diffusers/run.sh 7:55AM DBG GRPC Service for flux.1dev-abliteratedv2 will be running at: '127.0.0.1:37081' 7:55AM DBG GRPC Service state dir: /tmp/go-processmanager1749399755 7:55AM DBG GRPC Service Started 7:55AM DBG Wait for the service to start up 7:55AM DBG Options: ContextSize:1024 Seed:882476338 NBatch:512 F16Memory:true MMap:true LowVRAM:true NGPULayers:99999999 Threads:8 PipelineType:"FluxPipeline" CUDA:true 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stdout Initializing libbackend for diffusers 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stdout virtualenv activated 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stdout activated virtualenv has been ensured 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/google/protobuf/runtime_version.py:98: UserWarning: Protobuf gencode version 5.29.0 is exactly one major version older than the runtime version 6.30.2 at backend.proto. Please update the gencode to avoid compatibility violations in the next runtime release. 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr warnings.warn( 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:105: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead. 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr warnings.warn( 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr /build/backend/python/diffusers/venv/lib/python3.10/site-packages/torch/cuda/init.py:230: UserWarning: 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90. 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr warnings.warn( 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr Server started. Listening on: 127.0.0.1:37081 7:55AM DBG GRPC Service Ready 7:55AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:0xc0007f1958} sizeCache:0 unknownFields:[] Model:SicariusSicariiStuff/flux.1dev-abliteratedv2 ContextSize:1024 Seed:882476338 NBatch:512 F16Memory:true MLock:false MMap:true VocabOnly:false LowVRAM:true Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:8 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/SicariusSicariiStuff/flux.1dev-abliteratedv2 Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType:FluxPipeline SchedulerType: CUDA:true CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 LoadFormat: DisableLogStatus:false DType: LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false ModelPath:/build/models LoraAdapters:[] LoraScales:[] Options:[] CacheTypeKey: CacheTypeValue: GrammarTriggers:[]} 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr Loading model SicariusSicariiStuff/flux.1dev-abliteratedv2... 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr Request Model: "SicariusSicariiStuff/flux.1dev-abliteratedv2" 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr ContextSize: 1024 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr Seed: 882476338 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr NBatch: 512 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr F16Memory: true 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr MMap: true 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr LowVRAM: true 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr NGPULayers: 99999999 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr Threads: 8 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr ModelFile: "/build/models/SicariusSicariiStuff/flux.1dev-abliteratedv2" 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr PipelineType: "FluxPipeline" 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr CUDA: true 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr ModelPath: "/build/models" 7:55AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr

Fetching 24 files: 100%|██████████| 24/24 [08:15<00:00, 20.64s/it] Loading checkpoint shards: 100%|██████████| 3/3 [00:02<00:00, 1.04it/s]it/s] Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00, 2.93s/it] 8:03AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers Loading pipeline components...: 100%|██████████| 7/7 [00:12<00:00, 1.75s/it] 8:03AM DBG GRPC(flux.1dev-abliteratedv2-127.0.0.1:37081): stderr It seems like you have activated model offloading by calling enable_model_cpu_offload, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, transformer, scheduler, image_encoder, feature_extractor to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: pipeline.to('cpu') or removing the move altogether if you use offloading. 8:04AM ERR Server error error="failed to load model with internal loader: could not load model (no success): Unexpected err=OutOfMemoryError('CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 31.36 GiB of which 19.62 MiB is free. Including non-PyTorch memory, this process has 31.33 GiB memory in use. Of the allocated memory 30.84 GiB is allocated by PyTorch, and 6.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)'), type(err)=<class 'torch.OutOfMemoryError'>" ip=XXX.XXX.XXX.XXX latency=8m50.612670769s method=POST status=500 url=/v1/images/generations

Apr 22 '25 08:04 SuperPat45

We have the same error with the model flux.1dev-abliteratedv2 with and without python and you have the issue of CUDA runtime with the model flux.1-shchnell?

Image: master-cublas-cuda12-ffmpeg-core

8:52AM DBG context local model name not found, setting to default defaultModelName=stablediffusion

8:52AM DBG Parameter Config: &{PredictionOptions:{BasicModelRequest:{Model:SicariusSicariiStuff/flux.1dev-abliteratedv2} Language: Translate:false N:0 TopP:0xc001b46fa0 TopK:0xc001b46fa8 Temperature:0xc001b46fb0 Maxtokens:0xc001b46fe0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 RepeatLastN:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc001b46fd8 TypicalP:0xc001b46fd0 Seed:0xc001b46ff8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:flux.1dev-abliteratedv2 F16:0xc001b46f89 Threads:0xc001b46f90 Debug:0xc001096318 Roles:map[] Embeddings:0xc001b46ff1 Backend:diffusers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil> Multimodal: JinjaTemplate:false ReplyPrefix:} KnownUsecaseStrings:[FLAG_IMAGE FLAG_ANY] KnownUsecases:<nil> PromptStrings:[Cool pink sports car] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: ResponseFormat: ResponseFormatMap:map[] FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix: ExpectStringsAfterJSON:false PropOrder: SchemaType: GrammarTriggers:[]} NoActionFunctionName: NoActionDescriptionName: ResponseRegex:[] JSONRegexMatch:[] ArgumentRegex:[] ArgumentRegexKey: ArgumentRegexValue: ReplaceFunctionResults:[] ReplaceLLMResult:[] CaptureLLMResult:[] FunctionNameKey: FunctionArgumentsKey:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc001b46fc8 MirostatTAU:0xc001b46fc0 Mirostat:0xc001b46fb8 NGPULayers:0xc001b46fe8 MMap:0xc001b46ff0 MMlock:0xc001b46ff1 LowVRAM:0xc001b46f8a Grammar: StopWords:[] Cutstrings:[] ExtractRegex:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc001b47000 NUMA:false LoraAdapter: LoraBase: LoraAdapters:[] LoraScales:[] LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: LoadFormat: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 DisableLogStatus:false DType: LimitMMPerPrompt:{LimitImagePerPrompt:0 LimitVideoPerPrompt:0 LimitAudioPerPrompt:0} MMProj: FlashAttention:false NoKVOffloading:false CacheTypeK: CacheTypeV: RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 CFGScale:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:true PipelineType:FluxPipeline SchedulerType: EnableParameters:num_inference_steps IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:25 GRPC:{Attempts:0 AttemptsSleepTime:0} TTSConfig:{Voice: AudioPath:} CUDA:false DownloadFiles:[] Description: Usage: Options:[]}

Apr 22 '25 09:04 Hello-World-Traveler

I found a quantized GUFF version of this model: https://huggingface.co/city96/FLUX.1-schnell-gguf/tree/main

Are they compatible with LocalAI and if yes how to use it?

Apr 22 '25 11:04 SuperPat45

To use a custom model, you can follow these steps: Create a model file stablediffusion.yaml in the models folder:

name: stablediffusion
backend: stablediffusion-ggml
parameters:
  model: FLUX.1-schnell-gguf
step: 25
cfg_scale: 4.5
options:
- "clip_l_path:clip_l.safetensors"
- "clip_g_path:clip_g.safetensors"
- "t5xxl_path:t5xxl-Q5_0.gguf"
- "sampler:euler"

Download the required assets to the models repository Start LocalAI

Apr 23 '25 01:04 Hello-World-Traveler

@Hello-World-Traveler from https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main, do we need to download the whole dir or does the flux1-dev.safetensor enough, and how would that affect the yaml file? Thanks!

Apr 29 '25 11:04 simonmaeldev

FLUX.1 [dev] is already available to download on the models page.

This one can be used https://huggingface.co/city96/FLUX.1-dev-gguf

Apr 29 '25 21:04 Hello-World-Traveler

The model gallery contains a FLUX.1 DEV ggml models that works, but not the schnell version. We should add this one: https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q2_K.gguf

May 24 '25 10:05 SuperPat45

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Aug 25 '25 02:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Sep 03 '25 02:09 github-actions[bot]

For anyone who is still struggling with that - you need to generate a token on Huggingface (in the user profie) and then set the environment variable HF_TOKEN (see https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables)

For docker add "-e HF_TOKEN=hf_xxx"

Nov 22 '25 16:11 svonolfen