[Bug] z-image-turbo very slow on mac
Git commit
My MacBook with an M3 chip takes 600 seconds per iteration. With ComfyUI, it only takes 19 seconds per iteration. Is this normal? Or am I doing something wrong?
Operating System & Version
macOS 26.1
GGML backends
Metal
Command-line arguments used
./sd --diffusion-model z_image_turbo-Q3_K.gguf --vae vae.sft --llm Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -v --offload-to-cpu --diffusion-fa -H 1024 -W 512
Steps to reproduce
download https://github.com/leejet/stable-diffusion.cpp/releases/download/master-387-e4c50f1/sd-master-e4c50f1-bin-Darwin-macOS-15.7.2-arm64.zip
unzip
copy models to folder
SET DYLD_LIBRARY_PATH to folder of binary
run ./sd --diffusion-model z_image_turbo-Q3_K.gguf --vae vae.sft --llm Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -v --offload-to-cpu --diffusion-fa -H 1024 -W 512
What you expected to happen
Iteration of 20 seconds per step
What actually happened
Iteration of 600 seconds per step
Logs / error messages / stack trace
No response
Additional context / environment details
No response
Can you post the log with -v ?
./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors --llm Qwen3-4B-Instruct-2507-Q2_K.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -H 1024 -W 512 --steps 1 -v --offload-to-cpu --diffusion-fa Option: n_threads: 6 mode: img_gen model_path: wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: llm_path: Qwen3-4B-Instruct-2507-Q2_K.gguf llm_vision_path: diffusion_model_path: z_image_turbo-Q2_K.gguf high_noise_diffusion_model_path: vae_path: z-image-vae.safetensors taesd_path: esrgan_path: control_net_path: embedding_dir: photo_maker_path: pm_id_images_dir: pm_id_embed_path: pm_style_strength: 20.00 output_path: output.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: control_video_path: auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: true clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: true diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. negative_prompt: clip_skip: -1 width: 512 height: 1024 sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 1, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default lora_apply_mode: auto flow_shift: inf strength(img2img): 0.75 rng: cuda sampler rng: NONE seed: 42 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 easycache: disabled (threshold=0.200, start=0.15, end=0.95) vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 0 AVX = 0 AVX2 = 0 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 0 NEON = 1 ARM_FMA = 1 F16C = 0 FP16_VA = 1 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:189 - Using CPU backend [INFO ] stable-diffusion.cpp:234 - loading diffusion model from 'z_image_turbo-Q2_K.gguf' [INFO ] model.cpp:378 - load z_image_turbo-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'z_image_turbo-Q2_K.gguf' [INFO ] stable-diffusion.cpp:281 - loading llm from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] model.cpp:378 - load Qwen3-4B-Instruct-2507-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] stable-diffusion.cpp:295 - loading vae from 'z-image-vae.safetensors' [INFO ] model.cpp:381 - load z-image-vae.safetensors using safetensors format [DEBUG] model.cpp:511 - init from 'z-image-vae.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:318 - Version: Z-Image [INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q2_K: 324 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q2_K: 180 [INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244 [DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes [DEBUG] llm.hpp:285 - merges size 151387 [DEBUG] llm.hpp:317 - vocab size: 151665 [INFO ] stable-diffusion.cpp:535 - Using flash attention in the diffusion model [DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors) [DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2472.32 MB(RAM) (453 tensors) [DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:683 - loading weights [DEBUG] model.cpp:1359 - using 6 threads for model loading [DEBUG] model.cpp:1381 - loading tensors from z_image_turbo-Q2_K.gguf |====================> | 453/1095 - 1126.87it/s [DEBUG] model.cpp:1381 - loading tensors from Qwen3-4B-Instruct-2507-Q2_K.gguf |======================================> | 851/1095 - 1049.32it/s [DEBUG] model.cpp:1381 - loading tensors from z-image-vae.safetensors |==================================================| 1095/1095 - 1080.95it/s [INFO ] model.cpp:1590 - loading tensors completed, taking 1.01s (process: 0.00s, read: 0.58s, memcpy: 0.00s, convert: 0.02s, copy_to_backend: 0.00s) [INFO ] stable-diffusion.cpp:782 - total params memory size = 5332.83MB (VRAM 0.00MB, RAM 5332.83MB): text_encoders 2765.94MB(RAM), diffusion_model 2472.32MB(RAM), vae 94.57MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM) [INFO ] stable-diffusion.cpp:883 - running in FLOW mode [DEBUG] stable-diffusion.cpp:908 - finished loaded file [DEBUG] stable-diffusion.cpp:3138 - generate_image 512x1024 [INFO ] stable-diffusion.cpp:3169 - sampling using Euler method [INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler [INFO ] stable-diffusion.cpp:3282 - TXT2IMG [INFO ] stable-diffusion.cpp:1167 - apply at runtime [DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.<|im_end|> <|im_start|>assistant ' to [['<|im_start|>user ', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.', 1], ['<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] llm.hpp:259 - split prompt "<|im_start|>user " to tokens ["<|im_start|>", "user", "Ċ", ] [DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", ] [DEBUG] llm.hpp:259 - split prompt "<|im_end|> <|im_start|>assistant " to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ] [DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 3.72 MB(RAM) [DEBUG] conditioner.hpp:1896 - computing condition graph completed, taking 4440 ms [INFO ] stable-diffusion.cpp:2917 - get_learned_condition completed, taking 4441 ms [INFO ] stable-diffusion.cpp:3028 - generating image: 1/1 - seed 42 [DEBUG] ggml_extend.hpp:1691 - z_image compute buffer size: 281.77 MB(RAM) |==================================================| 1/1 - 570.76s/it [INFO ] stable-diffusion.cpp:3069 - sampling completed, taking 570.80s [INFO ] stable-diffusion.cpp:3077 - generating 1 latent images completed, taking 570.84s [INFO ] stable-diffusion.cpp:3080 - decoding 1 latents [DEBUG] ggml_extend.hpp:1691 - vae compute buffer size: 3328.00 MB(RAM) [DEBUG] stable-diffusion.cpp:2286 - computing vae decode graph completed, taking 19.75s [INFO ] stable-diffusion.cpp:3090 - latent 1 decoded, taking 19.75s [INFO ] stable-diffusion.cpp:3094 - decode_first_stage completed, taking 19.75s [INFO ] stable-diffusion.cpp:3390 - generate_image completed in 595.03s save result PNG image to 'output.png' (success)
[DEBUG] stable-diffusion.cpp:189 - Using CPU backend
Looks like it's not using Metal
I used the pre compiled release version. Can I enable metal?
The precompiled version does not support Metal. If you need Metal support, you will have to compile it yourself. You can refer to this guide for compilation: https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/build.md
After compiling, metal is used. Unfortunately, I get another error when loading the GGUF text encoder.
`./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors --llm Qwen3-4B-Instruct-2507-Q2_K.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." --cfg-scale 1.0 -H 1024 -W 512 --steps 1 -v --offload-to-cpu --diffusion-fa Option: n_threads: 6 mode: img_gen model_path: wtype: unspecified clip_l_path: clip_g_path: clip_vision_path: t5xxl_path: llm_path: Qwen3-4B-Instruct-2507-Q2_K.gguf llm_vision_path: diffusion_model_path: z_image_turbo-Q2_K.gguf high_noise_diffusion_model_path: vae_path: z-image-vae.safetensors taesd_path: esrgan_path: control_net_path: embedding_dir: photo_maker_path: pm_id_images_dir: pm_id_embed_path: pm_style_strength: 20.00 output_path: output.png init_image_path: end_image_path: mask_image_path: control_image_path: ref_images_paths: control_video_path: auto_resize_ref_image: true increase_ref_index: false offload_params_to_cpu: true clip_on_cpu: false control_net_cpu: false vae_on_cpu: false diffusion flash attention: true diffusion Conv2d direct: false vae_conv_direct: false control_strength: 0.90 prompt: A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. negative_prompt: clip_skip: -1 width: 512 height: 1024 sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 1, eta: 0.00, shifted_timestep: 0) high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: -1, eta: 0.00, shifted_timestep: 0) moe_boundary: 0.875 prediction: default lora_apply_mode: auto flow_shift: inf strength(img2img): 0.75 rng: cuda sampler rng: NONE seed: 42 batch_count: 1 vae_tiling: false force_sdxl_vae_conv_scale: false upscale_repeats: 1 chroma_use_dit_mask: true chroma_use_t5_mask: false chroma_t5_mask_pad: 1 video_frames: 1 easycache: disabled (threshold=0.200, start=0.15, end=0.95) vace_strength: 1.00 fps: 16 preview_mode: none (denoised) preview_interval: 1 System Info: SSE3 = 0 AVX = 0 AVX2 = 0 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 0 NEON = 1 ARM_FMA = 1 F16C = 0 FP16_VA = 1 WASM_SIMD = 0 VSX = 0 [DEBUG] stable-diffusion.cpp:163 - Using Metal backend [INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: using embedded metal library [INFO ] ggml_extend.hpp:69 - ggml_metal_library_init: loaded in 0.006 sec [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU name: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup reduction = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: simdgroup matrix mul. = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has unified memory = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: has bfloat = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use residency sets = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: use shared buffers = true [INFO ] ggml_extend.hpp:69 - ggml_metal_device_init: recommendedMaxWorkingSetSize = 30150.67 MB [INFO ] ggml_extend.hpp:69 - ggml_metal_init: allocating [INFO ] ggml_extend.hpp:69 - ggml_metal_init: found device: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_init: picking default device: Apple M3 Pro [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use bfloat = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use fusion = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use concurrency = true [INFO ] ggml_extend.hpp:69 - ggml_metal_init: use graph optimize = true [INFO ] stable-diffusion.cpp:234 - loading diffusion model from 'z_image_turbo-Q2_K.gguf' [INFO ] model.cpp:378 - load z_image_turbo-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'z_image_turbo-Q2_K.gguf' [INFO ] stable-diffusion.cpp:281 - loading llm from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] model.cpp:378 - load Qwen3-4B-Instruct-2507-Q2_K.gguf using gguf format [DEBUG] model.cpp:420 - init from 'Qwen3-4B-Instruct-2507-Q2_K.gguf' [INFO ] stable-diffusion.cpp:295 - loading vae from 'z-image-vae.safetensors' [INFO ] model.cpp:381 - load z-image-vae.safetensors using safetensors format [DEBUG] model.cpp:511 - init from 'z-image-vae.safetensors', prefix = 'vae.' [INFO ] stable-diffusion.cpp:318 - Version: Z-Image [INFO ] stable-diffusion.cpp:346 - Weight type stat: f32: 640 | q8_0: 22 | q2_K: 324 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:347 - Conditioner weight type stat: f32: 145 | q2_K: 144 | q3_K: 72 | q4_K: 36 | q6_K: 1 [INFO ] stable-diffusion.cpp:348 - Diffusion model weight type stat: f32: 251 | q8_0: 22 | q2_K: 180 [INFO ] stable-diffusion.cpp:349 - VAE weight type stat: f32: 244 [DEBUG] stable-diffusion.cpp:351 - ggml tensor size = 400 bytes [DEBUG] llm.hpp:285 - merges size 151387 [DEBUG] llm.hpp:317 - vocab size: 151665 [INFO ] stable-diffusion.cpp:535 - Using flash attention in the diffusion model [DEBUG] ggml_extend.hpp:1877 - qwen3 params backend buffer size = 2765.94 MB(RAM) (398 tensors) [DEBUG] ggml_extend.hpp:1877 - z_image params backend buffer size = 2472.32 MB(RAM) (453 tensors) [DEBUG] ggml_extend.hpp:1877 - vae params backend buffer size = 94.57 MB(RAM) (138 tensors) [DEBUG] stable-diffusion.cpp:683 - loading weights [DEBUG] model.cpp:1359 - using 6 threads for model loading [DEBUG] model.cpp:1381 - loading tensors from z_image_turbo-Q2_K.gguf |====================> | 453/1095 - 1104.88it/s [DEBUG] model.cpp:1381 - loading tensors from Qwen3-4B-Instruct-2507-Q2_K.gguf |======================================> | 851/1095 - 1049.32it/s [DEBUG] model.cpp:1381 - loading tensors from z-image-vae.safetensors |==================================================| 1095/1095 - 1078.82it/s [INFO ] model.cpp:1590 - loading tensors completed, taking 1.02s (process: 0.00s, read: 0.57s, memcpy: 0.00s, convert: 0.02s, copy_to_backend: 0.00s) [INFO ] stable-diffusion.cpp:782 - total params memory size = 5332.83MB (VRAM 5332.83MB, RAM 0.00MB): text_encoders 2765.94MB(VRAM), diffusion_model 2472.32MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM) [INFO ] stable-diffusion.cpp:883 - running in FLOW mode [DEBUG] stable-diffusion.cpp:908 - finished loaded file [DEBUG] stable-diffusion.cpp:3138 - generate_image 512x1024 [INFO ] stable-diffusion.cpp:3169 - sampling using Euler method [INFO ] denoiser.hpp:364 - get_sigmas with discrete scheduler [INFO ] stable-diffusion.cpp:3282 - TXT2IMG [INFO ] stable-diffusion.cpp:1167 - apply at runtime [DEBUG] conditioner.hpp:1701 - parse '<|im_start|>user A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.<|im_end|> <|im_start|>assistant ' to [['<|im_start|>user ', 1], ['A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night.', 1], ['<|im_end|> <|im_start|>assistant ', 1], ] [DEBUG] llm.hpp:259 - split prompt "<|im_start|>user " to tokens ["<|im_start|>", "user", "Ċ", ] [DEBUG] llm.hpp:259 - split prompt "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night." to tokens ["A", "Ġcinematic", ",", "Ġmelanch", "olic", "Ġphotograph", "Ġof", "Ġa", "Ġsolitary", "Ġhood", "ed", "Ġfigure", "Ġwalking", "Ġthrough", "Ġa", "Ġsprawling", ",", "Ġrain", "-s", "lick", "ed", "Ġmet", "ropolis", "Ġat", "Ġnight", ".", ] [DEBUG] llm.hpp:259 - split prompt "<|im_end|> <|im_start|>assistant " to tokens ["<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", ] [INFO ] ggml_extend.hpp:1791 - qwen3 offload params (2765.94 MB, 398 tensors) to runtime backend (Metal), taking 0.28s [DEBUG] ggml_extend.hpp:1691 - qwen3 compute buffer size: 3.72 MB(VRAM) [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_get_rows_f32', name = 'kernel_get_rows_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_get_rows_f32 0x10692af70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rms_norm_mul_f32_4', name = 'kernel_rms_norm_mul_f32_4' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_rms_norm_mul_f32_4 0x10692b770 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q4_K_f32', name = 'kernel_mul_mm_q4_K_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q4_K_f32_bci=0_bco=1 0x10692bf70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_cpy_f32_f32', name = 'kernel_cpy_f32_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_cpy_f32_f32 0x10692c770 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_q2_K_f32', name = 'kernel_mul_mm_q2_K_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_q2_K_f32_bci=0_bco=1 0x10692cf70 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_rope_neox_f32', name = 'kernel_rope_neox_f32' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_rope_neox_f32 0x10692d270 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_mul_mm_f32_f32', name = 'kernel_mul_mm_f32_f32_bci=0_bco=1' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_mul_mm_f32_f32_bci=0_bco=1 0x10692d570 | th_max = 1024 | th_width = 32 [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: compiling pipeline: base = 'kernel_scale_f32_4', name = 'kernel_scale_f32_4' [DEBUG] ggml_extend.hpp:66 - ggml_metal_library_compile_pipeline: loaded kernel_scale_f32_4 0x10692dd70 | th_max = 1024 | th_width = 32 [ERROR] ggml_extend.hpp:75 - ggml_metal_op_encode_impl: error: unsupported op 'DIAG_MASK_INF' /Users/admin/Downloads/stable-diffusion.cpp/ggml/src/ggml-metal/ggml-metal-ops.cpp:201: unsupported op (lldb) process attach --pid 38923 Process 38923 stopped
- thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x0000000197b5242c libsystem_kernel.dylib
__wait4 + 8 libsystem_kernel.dylib__wait4: -> 0x197b5242c <+8>: b.lo 0x197b5244c ; <+40> 0x197b52430 <+12>: pacibsp 0x197b52434 <+16>: stp x29, x30, [sp, #-0x10]! 0x197b52438 <+20>: mov x29, sp Target 0: (sd) stopped. Executable binary set to "/Users/admin/Downloads/stable-diffusion.cpp/build/bin/sd". Architecture set to: arm64-apple-macosx-. (lldb) bt - thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
- frame #0: 0x0000000197b5242c libsystem_kernel.dylib
__wait4 + 8 frame #1: 0x0000000104c6300c sdggml_abort + 156 frame #2: 0x0000000104c5af7c sdggml_metal_op_encode + 2348 frame #3: 0x0000000104c5a470 sd__ggml_metal_set_n_cb_block_invoke + 180 frame #4: 0x0000000104c5a0b4 sdggml_metal_graph_compute + 400 frame #5: 0x0000000104c7a7fc sdggml_backend_graph_compute + 32 frame #6: 0x0000000104b11c38 sdGGMLRunner::compute(std::__1::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) + 584 frame #7: 0x0000000104b37f04 sdLLMEmbedder::get_learned_condition(ggml_context*, int, ConditionerParams const&) + 2356 frame #8: 0x0000000104add914 sdgenerate_image_internal(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int, sd_guidance_params_t, float, int, int, int, sample_method_t, std::__1::vector<float, std::__1::allocator<float>> const&, long long, int, sd_image_t, float, sd_pm_params_t, std::__1::vector<sd_image_t*, std::__1::allocator<sd_image_t*>>, std::__1::vector<ggml_tensor*, std::__1::allocator<ggml_tensor*>>, bool, ggml_tensor*, ggml_tensor*, sd_easycache_params_t const*) + 3232 frame #9: 0x0000000104ae50f8 sdgenerate_image + 5720 frame #10: 0x0000000104a6a45c sdmain + 4356 frame #11: 0x00000001977c5d54 dyldstart + 7184 (lldb) quit zsh: abort ./sd --diffusion-model z_image_turbo-Q2_K.gguf --vae z-image-vae.safetensors`
- frame #0: 0x0000000197b5242c libsystem_kernel.dylib
precompiled
What’s the point of releasing an Apple binary without Metal support?