black image as video while using Wan 2.2 with Vulkan on OSX

Open debamitro opened this issue 3 months ago • 0 comments
Here's the command-line output:
build/bin/sd -M vid_gen --diffusion-model ../wan2.2_ti2v_5B_fp16.safetensors --vae ../wan2.2_vae.safetensors --t5xxl ../umt5_xxl_fp16.safetensors -p "a walking cat" --cfg-scale 6.0 --sampling-method euler -v -W 480 -H 832 --offload-to-cpu --diffusion-fa --flow-shift 3.0 --clip-on-cpu -n "blurry" --vae-on-cpu --video-frames 16
Option: 
    n_threads:                         4
    mode:                              vid_gen
    model_path:                        
    wtype:                             unspecified
    clip_l_path:                       
    clip_g_path:                       
    clip_vision_path:                  
    t5xxl_path:                        ../umt5_xxl_fp16.safetensors
    diffusion_model_path:              ../wan2.2_ti2v_5B_fp16.safetensors
    high_noise_diffusion_model_path:   
    vae_path:                          ../wan2.2_vae.safetensors
    taesd_path:                        
    esrgan_path:                       
    control_net_path:                  
    embedding_dir:                     
    photo_maker_path:                  
    pm_id_images_dir:                  
    pm_id_embed_path:                  
    pm_style_strength:                 20.00
    output_path:                       output.png
    init_image_path:                   
    end_image_path:                    
    mask_image_path:                   
    control_image_path:                
    ref_images_paths:
    control_video_path:                
    increase_ref_index:                false
    offload_params_to_cpu:             true
    clip_on_cpu:                       true
    control_net_cpu:                   false
    vae_on_cpu:                        true
    diffusion flash attention:         true
    diffusion Conv2d direct:           false
    vae_conv_direct:                   false
    control_strength:                  0.90
    prompt:                            a walking cat
    negative_prompt:                   blurry
    clip_skip:                         -1
    width:                             480
    height:                            832
    sample_params:                     (txt_cfg: 6.00, img_cfg: 6.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
    high_noise_sample_params:          (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
    moe_boundary:                      0.875
    flow_shift:                        3.00
    strength(img2img):                 0.75
    rng:                               cuda
    seed:                              42
    batch_count:                       1
    vae_tiling:                        false
    upscale_repeats:                   1
    chroma_use_dit_mask:               true
    chroma_use_t5_mask:                false
    chroma_t5_mask_pad:                1
    video_frames:                      16
    vace_strength:                     1.00
    fps:                               16
System Info: 
    SSE3 = 0
    AVX = 0
    AVX2 = 0
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 0
    NEON = 1
    ARM_FMA = 1
    F16C = 0
    FP16_VA = 1
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:152  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:62   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:62   - ggml_vulkan: 0 = Apple M4 (MoltenVK) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 0 | matrix cores: none
[INFO ] stable-diffusion.cpp:208  - loading diffusion model from '../wan2.2_ti2v_5B_fp16.safetensors'
[INFO ] model.cpp:1044 - load ../wan2.2_ti2v_5B_fp16.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '../wan2.2_ti2v_5B_fp16.safetensors', prefix = 'model.diffusion_model.'
[INFO ] stable-diffusion.cpp:248  - loading t5xxl from '../umt5_xxl_fp16.safetensors'
[INFO ] model.cpp:1044 - load ../umt5_xxl_fp16.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '../umt5_xxl_fp16.safetensors', prefix = 'text_encoders.t5xxl.transformer.'
[INFO ] stable-diffusion.cpp:255  - loading vae from '../wan2.2_vae.safetensors'
[INFO ] model.cpp:1044 - load ../wan2.2_vae.safetensors using safetensors format
[DEBUG] model.cpp:1151 - init from '../wan2.2_vae.safetensors', prefix = 'vae.'
[DEBUG] model.cpp:1784 - patch_embedding_channels 147456
[INFO ] stable-diffusion.cpp:267  - Version: Wan 2.2 TI2V 
[INFO ] stable-diffusion.cpp:298  - Weight type:                 f16
[INFO ] stable-diffusion.cpp:299  - Conditioner weight type:     f16
[INFO ] stable-diffusion.cpp:300  - Diffusion model weight type: f16
[INFO ] stable-diffusion.cpp:301  - VAE weight type:             NONE
[DEBUG] stable-diffusion.cpp:303  - ggml tensor size = 400 bytes
[INFO ] stable-diffusion.cpp:338  - CLIP: Using CPU backend
[INFO ] stable-diffusion.cpp:342  - Using flash attention in the diffusion model
[INFO ] wan.hpp:2131 - Wan2.2-TI2V-5B
[DEBUG] ggml_extend.hpp:1729 - t5 params backend buffer size =  10835.86 MB(RAM) (242 tensors)
[DEBUG] ggml_extend.hpp:1729 - Wan2.2-TI2V-5B params backend buffer size =  9540.93 MB(RAM) (825 tensors)
[INFO ] stable-diffusion.cpp:456  - VAE Autoencoder: Using CPU backend
[DEBUG] ggml_extend.hpp:1729 - wan_vae params backend buffer size =  1344.24 MB(RAM) (196 tensors)
[DEBUG] stable-diffusion.cpp:565  - loading weights
[DEBUG] model.cpp:1961 - using 4 threads for model loading
[DEBUG] model.cpp:2044 - loading tensors from ../wan2.2_ti2v_5B_fp16.safetensors
  |================================>                 | 825/1263 - 44.45it/s
[DEBUG] model.cpp:2044 - loading tensors from ../umt5_xxl_fp16.safetensors
  |==========================================>       | 1067/1263 - 21.82it/s
[DEBUG] model.cpp:2044 - loading tensors from ../wan2.2_vae.safetensors
  |==================================================| 1263/1263 - 22.83it/s
[INFO ] model.cpp:2288 - loading tensors completed, taking 55.32s (process: 0.00s, read: 54.38s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:661  - total params memory size = 21721.03MB (VRAM 9540.93MB, RAM 12180.10MB): text_encoders 10835.86MB(RAM), diffusion_model 9540.93MB(VRAM), vae 1344.24MB(RAM), controlnet 0.00MB(VRAM), pmid 0.00MB(RAM)
[INFO ] stable-diffusion.cpp:701  - running in FLOW mode
[DEBUG] stable-diffusion.cpp:725  - finished loaded file
[INFO ] stable-diffusion.cpp:2493 - generate_video 480x832x13
[INFO ] stable-diffusion.cpp:874  - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:894  - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:895  - prompt after extract and remove lora: "a walking cat"
[DEBUG] conditioner.hpp:1267 - parse 'a walking cat' to [['a walking cat', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[DEBUG] ggml_extend.hpp:1553 - t5 compute buffer size: 297.00 MB(RAM)
[DEBUG] conditioner.hpp:1359 - computing condition graph completed, taking 32373 ms
[DEBUG] conditioner.hpp:1267 - parse 'blurry' to [['blurry', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[DEBUG] ggml_extend.hpp:1553 - t5 compute buffer size: 297.00 MB(RAM)
[DEBUG] conditioner.hpp:1359 - computing condition graph completed, taking 22357 ms
[INFO ] stable-diffusion.cpp:2757 - get_learned_condition completed, taking 54737 ms
[DEBUG] stable-diffusion.cpp:2819 - sample 30x52x4
[INFO ] ggml_extend.hpp:1653 - Wan2.2-TI2V-5B offload params (9540.93 MB, 825 tensors) to runtime backend (Vulkan0), taking 42.38s
[DEBUG] ggml_extend.hpp:1553 - Wan2.2-TI2V-5B compute buffer size: 156.82 MB(VRAM)
  |==================================================| 20/20 - 24.16s/it
[INFO ] stable-diffusion.cpp:2846 - sampling completed, taking 519.22s
[INFO ] stable-diffusion.cpp:2867 - generating latent video completed, taking 519.50s
[DEBUG] ggml_extend.hpp:1553 - wan_vae compute buffer size: 20112.94 MB(RAM)
[DEBUG] stable-diffusion.cpp:1547 - computing vae decode graph completed, taking 790.34s
[INFO ] stable-diffusion.cpp:2870 - decode_first_stage completed, taking 790.34s
[INFO ] stable-diffusion.cpp:2890 - generate_video completed in 1364.57s
save result MJPG AVI video to 'output.avi'
Sep 27 '25 02:09 debamitro