stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

[Bug] z-image-turbo very slow on mac

Open bjoernrenzel-optadata opened this issue 1 month ago • 7 comments

Git commit

My MacBook with an M3 chip takes 600 seconds per iteration. With ComfyUI, it only takes 19 seconds per iteration. Is this normal? Or am I doing something wrong?

Operating System & Version

macOS 26.1

GGML backends

Metal

Command-line arguments used

./sd --diffusion-model z_image_turbo-Q3_K.gguf --vae vae.sft --llm Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -v --offload-to-cpu --diffusion-fa -H 1024 -W 512

Steps to reproduce

download https://github.com/leejet/stable-diffusion.cpp/releases/download/master-387-e4c50f1/sd-master-e4c50f1-bin-Darwin-macOS-15.7.2-arm64.zip

unzip

copy models to folder

SET DYLD_LIBRARY_PATH to folder of binary

run ./sd --diffusion-model z_image_turbo-Q3_K.gguf --vae vae.sft --llm Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -v --offload-to-cpu --diffusion-fa -H 1024 -W 512

What you expected to happen

Iteration of 20 seconds per step

What actually happened

Iteration of 600 seconds per step

Logs / error messages / stack trace

No response

Additional context / environment details

No response

bjoernrenzel-optadata avatar Dec 02 '25 14:12 bjoernrenzel-optadata

Can you post the log with -v ?

Green-Sky avatar Dec 02 '25 15:12 Green-Sky

./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors --llm Qwen3-4B-Instruct-2507-Q2_K.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -H 1024 -W 512 --steps 1 -v --offload-to-cpu --diffusion-fa Option: n_threads: 6 mode: img_gen model_path: wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: llm_path: Qwen3-4B-Instruct-2507-Q2_K.gguf llm_vision_path: diffusion_model_path: z_image_turbo-Q2_K.gguf high_noise_diffusion_model_path: vae_path: z-image-vae.safetensors taesd_path: esrgan_path: control_net_path: embedding_dir: photo_maker_path: pm_id_images_dir: pm_id_embed_path: pm_style_strength: 20.00 output_path: output.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: control_video_path: auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: true clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: true diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. negative_prompt: clip_skip: -1 width: 512 height: 1024 sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 1, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default lora_apply_mode: auto flow_shift: inf strength(img2img): 0.75 rng: cuda sampler rng: NONE seed: 42 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 easycache: disabled (threshold=0.200, start=0.15, end=0.95) vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 0 AVX = 0 AVX2 = 0 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 0 NEON = 1 ARM_FMA = 1 F16C = 0 FP16_VA = 1 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:189 - Using CPU backend [INFO ] stable-diffusion.cpp:234 - loading diffusion model from 'z_image_turbo-Q2_K.gguf' [INFO ] model.cpp:378 - load z_image_turbo-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'z_image_turbo-Q2_K.gguf' [INFO ] stable-diffusion.cpp:281 - loading llm from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] model.cpp:378 - load Qwen3-4B-Instruct-2507-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] stable-diffusion.cpp:295 - loading vae from 'z-image-vae.safetensors' [INFO ] model.cpp:381 - load z-image-vae.safetensors using safetensors format [DEBUG] model.cpp:511 - init from 'z-image-vae.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:318 - Version: Z-Image [INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q2_K: 324 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q2_K: 180 [INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244 [DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes [DEBUG] llm.hpp:285 - merges size 151387 [DEBUG] llm.hpp:317 - vocab size: 151665 [INFO ] stable-diffusion.cpp:535 - Using flash attention in the diffusion model [DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors) [DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2472.32 MB(RAM) (453 tensors) [DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:683 - loading weights [DEBUG] model.cpp:1359 - using 6 threads for model loading [DEBUG] model.cpp:1381 - loading tensors from z_image_turbo-Q2_K.gguf |====================> | 453/1095 - 1126.87it/s [DEBUG] model.cpp:1381 - loading tensors from Qwen3-4B-Instruct-2507-Q2_K.gguf |======================================> | 851/1095 - 1049.32it/s [DEBUG] model.cpp:1381 - loading tensors from z-image-vae.safetensors |==================================================| 1095/1095 - 1080.95it/s [INFO ] model.cpp:1590 - loading tensors completed, taking 1.01s (process: 0.00s, read: 0.58s, memcpy: 0.00s, convert: 0.02s, copy_to_backend: 0.00s) [INFO ] stable-diffusion.cpp:782 - total params memory size = 5332.83MB (VRAM 0.00MB, RAM 5332.83MB): text_encoders 2765.94MB(RAM), diffusion_model 2472.32MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:883 - running in FLOW mode [DEBUG] stable-diffusion.cpp:908 - finished loaded file [DEBUG] stable-diffusion.cpp:3138 - generate_image 512x1024 [INFO ] stable-diffusion.cpp:3169 - sampling using Euler method [INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler [INFO ] stable-diffusion.cpp:3282 - TXT2IMG [INFO ] stable-diffusion.cpp:1167 - apply at runtime [DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.<|im_end|> <|im_start|>assistant ' to [['<|im_start|>user ', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.', 1], ['<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] llm.hpp:259 - split prompt "<|im_start|>user " to tokens ["<|im_start|>", "user", "Ċ", ] [DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", ] [DEBUG] llm.hpp:259 - split prompt "<|im_end|> <|im_start|>assistant " to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ] [DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 3.72 MB(RAM) [DEBUG] conditioner.hpp:1896 - computing condition graph completed, taking 4440 ms [INFO ] stable-diffusion.cpp:2917 - get_learned_condition completed, taking 4441 ms [INFO ] stable-diffusion.cpp:3028 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:1691 - z_image compute buffer size: 281.77 MB(RAM) |==================================================| 1/1 - 570.76s/it [INFO ] stable-diffusion.cpp:3069 - sampling completed, taking 570.80s [INFO ] stable-diffusion.cpp:3077 - generating 1 latent images completed, taking 570.84s [INFO ] stable-diffusion.cpp:3080 - decoding 1 latents [DEBUG] ggml_extend.hpp:1691 - vae compute buffer size: 3328.00 MB(RAM) [DEBUG] stable-diffusion.cpp:2286 - computing vae decode graph completed, taking 19.75s [INFO ] stable-diffusion.cpp:3090 - latent 1 decoded, taking 19.75s [INFO ] stable-diffusion.cpp:3094 - decode_first_stage completed, taking 19.75s [INFO ] stable-diffusion.cpp:3390 - generate_image completed in 595.03s save result PNG image to 'output.png' (success)

bjoernrenzel-optadata avatar Dec 02 '25 15:12 bjoernrenzel-optadata

[DEBUG] stable-diffusion.cpp:189 - Using CPU backend

Looks like it's not using Metal

stduhpf avatar Dec 02 '25 16:12 stduhpf

I used the pre compiled release version. Can I enable metal?

bjoernrenzel-optadata avatar Dec 02 '25 16:12 bjoernrenzel-optadata

The precompiled version does not support Metal. If you need Metal support, you will have to compile it yourself. You can refer to this guide for compilation: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/build.md

leejet avatar Dec 02 '25 17:12 leejet

After compiling, metal is used. Unfortunately, I get another error when loading the GGUF text encoder.

`./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors --llm Qwen3-4B-Instruct-2507-Q2_K.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -H 1024 -W 512 --steps 1 -v --offload-to-cpu --diffusion-fa Option: n_threads: 6 mode: img_gen model_path: wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: llm_path: Qwen3-4B-Instruct-2507-Q2_K.gguf llm_vision_path: diffusion_model_path: z_image_turbo-Q2_K.gguf high_noise_diffusion_model_path: vae_path: z-image-vae.safetensors taesd_path: esrgan_path: control_net_path: embedding_dir: photo_maker_path: pm_id_images_dir: pm_id_embed_path: pm_style_strength: 20.00 output_path: output.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: control_video_path: auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: true clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: true diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. negative_prompt: clip_skip: -1 width: 512 height: 1024 sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 1, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default lora_apply_mode: auto flow_shift: inf strength(img2img): 0.75 rng: cuda sampler rng: NONE seed: 42 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 easycache: disabled (threshold=0.200, start=0.15, end=0.95) vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 0 AVX = 0 AVX2 = 0 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 0 NEON = 1 ARM_FMA = 1 F16C = 0 FP16_VA = 1 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:163 - Using Metal backend [INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: using embedded metal library [INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: loaded in 0.006 sec [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU name: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup reduction = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup matrix mul. = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has unified memory = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has bfloat = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use residency sets = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use shared buffers = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: recommendedMaxWorkingSetSize = 30150.67 MB [INFO ] ggml_extend.hpp:69 - ggml_metal_init: allocating [INFO ] ggml_extend.hpp:69 - ggml_metal_init: found device: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_init: picking default device: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use bfloat = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use fusion = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use concurrency = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use graph optimize = true [INFO ] stable-diffusion.cpp:234 - loading diffusion model from 'z_image_turbo-Q2_K.gguf' [INFO ] model.cpp:378 - load z_image_turbo-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'z_image_turbo-Q2_K.gguf' [INFO ] stable-diffusion.cpp:281 - loading llm from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] model.cpp:378 - load Qwen3-4B-Instruct-2507-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] stable-diffusion.cpp:295 - loading vae from 'z-image-vae.safetensors' [INFO ] model.cpp:381 - load z-image-vae.safetensors using safetensors format [DEBUG] model.cpp:511 - init from 'z-image-vae.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:318 - Version: Z-Image [INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q2_K: 324 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q2_K: 180 [INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244 [DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes [DEBUG] llm.hpp:285 - merges size 151387 [DEBUG] llm.hpp:317 - vocab size: 151665 [INFO ] stable-diffusion.cpp:535 - Using flash attention in the diffusion model [DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors) [DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2472.32 MB(RAM) (453 tensors) [DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:683 - loading weights [DEBUG] model.cpp:1359 - using 6 threads for model loading [DEBUG] model.cpp:1381 - loading tensors from z_image_turbo-Q2_K.gguf |====================> | 453/1095 - 1104.88it/s [DEBUG] model.cpp:1381 - loading tensors from Qwen3-4B-Instruct-2507-Q2_K.gguf |======================================> | 851/1095 - 1049.32it/s [DEBUG] model.cpp:1381 - loading tensors from z-image-vae.safetensors |==================================================| 1095/1095 - 1078.82it/s [INFO ] model.cpp:1590 - loading tensors completed, taking 1.02s (process: 0.00s, read: 0.57s, memcpy: 0.00s, convert: 0.02s, copy_to_backend: 0.00s) [INFO ] stable-diffusion.cpp:782 - total params memory size = 5332.83MB (VRAM 5332.83MB, RAM 0.00MB): text_encoders 2765.94MB(VRAM), diffusion_model 2472.32MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:883 - running in FLOW mode [DEBUG] stable-diffusion.cpp:908 - finished loaded file [DEBUG] stable-diffusion.cpp:3138 - generate_image 512x1024 [INFO ] stable-diffusion.cpp:3169 - sampling using Euler method [INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler [INFO ] stable-diffusion.cpp:3282 - TXT2IMG [INFO ] stable-diffusion.cpp:1167 - apply at runtime [DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.<|im_end|> <|im_start|>assistant ' to [['<|im_start|>user ', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.', 1], ['<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] llm.hpp:259 - split prompt "<|im_start|>user " to tokens ["<|im_start|>", "user", "Ċ", ] [DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", ] [DEBUG] llm.hpp:259 - split prompt "<|im_end|> <|im_start|>assistant " to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ] [INFO ] ggml_extend.hpp:1791 - qwen3 offload params (2765.94 MB, 398 tensors) to runtime backend (Metal), taking 0.28s [DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 3.72 MB(VRAM) [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0x10692af70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rms_norm_mul_f32_4', name = 'kernel_rms_norm_mul_f32_4' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_rms_norm_mul_f32_4 0x10692b770 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q4_K_f32', name = 'kernel_mul_mm_q4_K_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q4_K_f32_bci=0_bco=1 0x10692bf70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x10692c770 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q2_K_f32', name = 'kernel_mul_mm_q2_K_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q2_K_f32_bci=0_bco=1 0x10692cf70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x10692d270 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0x10692d570 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_scale_f32_4', name = 'kernel_scale_f32_4' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_scale_f32_4 0x10692dd70 | th_max = 1024 | th_width = 32 [ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF' /Users/admin/Downloads/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op (lldb) process attach --pid 38923 Process 38923 stopped

  • thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP frame #0: 0x0000000197b5242c libsystem_kernel.dylib__wait4 + 8 libsystem_kernel.dylib__wait4: -> 0x197b5242c <+8>: b.lo 0x197b5244c ; <+40> 0x197b52430 <+12>: pacibsp 0x197b52434 <+16>: stp x29, x30, [sp, #-0x10]! 0x197b52438 <+20>: mov x29, sp Target 0: (sd) stopped. Executable binary set to "/Users/admin/Downloads/stable-diffusion.cpp/build/bin/sd". Architecture set to: arm64-apple-macosx-. (lldb) bt
  • thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    • frame #0: 0x0000000197b5242c libsystem_kernel.dylib__wait4 + 8 frame #1: 0x0000000104c6300c sdggml_abort + 156 frame #2: 0x0000000104c5af7c sdggml_metal_op_encode + 2348 frame #3: 0x0000000104c5a470 sd__ggml_metal_set_n_cb_block_invoke + 180 frame #4: 0x0000000104c5a0b4 sdggml_metal_graph_compute + 400 frame #5: 0x0000000104c7a7fc sdggml_backend_graph_compute + 32 frame #6: 0x0000000104b11c38 sdGGMLRunner::compute(std::__1::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) + 584 frame #7: 0x0000000104b37f04 sdLLMEmbedder::get_learned_condition(ggml_context*, int, ConditionerParams const&) + 2356 frame #8: 0x0000000104add914 sdgenerate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int, sd_guidance_params_t, float, int, int, int, sample_method_t, std::__1::vector<float, std::__1::allocator<float>> const&, long long, int, sd_image_t, float, sd_pm_params_t, std::__1::vector<sd_image_t*, std::__1::allocator<sd_image_t*>>, std::__1::vector<ggml_tensor*, std::__1::allocator<ggml_tensor*>>, bool, ggml_tensor*, ggml_tensor*, sd_easycache_params_t const*) + 3232 frame #9: 0x0000000104ae50f8 sdgenerate_image + 5720 frame #10: 0x0000000104a6a45c sdmain + 4356 frame #11: 0x00000001977c5d54 dyldstart + 7184 (lldb) quit zsh: abort ./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors`

brenzel avatar Dec 02 '25 20:12 brenzel

precompiled

What’s the point of releasing an Apple binary without Metal support?

calvin2021y avatar Dec 08 '25 06:12 calvin2021y