ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

macbook m1, sdxl0.9 model, comfyui generation speed is much slower than webui, why?

Open guoreex opened this issue 2 years ago • 16 comments

I am a novice, my question may be a bit simple for everyone, I hope someone is willing to answer, thank you. Recently I am using sdxl0.9 in comfyui and auto1111, their generation speeds are too different, compter: macbook pro macbook m1,16G RAM. comfyui: 70s/it image

auto1111 webui dev: 5s/it image

Comfyui's unique workflow is very attractive, but the speed on mac m1 is frustrating. Is there anyone in the same situation as me?

guoreex avatar Jul 20 '23 14:07 guoreex

Yes I don't have a mac to test on so the speed is not optimized. You can try launching it with: --force-fp16 If it works it will increase speed.

comfyanonymous avatar Jul 20 '23 14:07 comfyanonymous

Thanks for your reply. --force-fp16, It seems that she can't work. I want to provide more test information, but I'm not a programmer and don't know what to do to help comfy operate better on mac. image

guoreex avatar Jul 21 '23 01:07 guoreex

--force-fp16 works when using the nightly version of PyTorch. It made the workflow much faster now. Using the default SDXL workflow on a MBP 16 M1 Pro 16GB:

100%|███████████████████████████████████████████| 20/20 [01:49<00:00,  5.46s/it]
100%|█████████████████████████████████████████████| 5/5 [00:36<00:00,  7.21s/it]
Prompt executed in 208.15 seconds

spikeyslam avatar Jul 31 '23 00:07 spikeyslam

I'm getting 6~7s/it on M1 Max 64G, SDXL 1.0, running both base & refiner with --force-fp16

100%|██████████| 15/15 [01:35<00:00,  6.36s/it]
Prompt executed in 193.71 seconds # dpp + karras

(this image may have the workflow engraved)

zenyr avatar Aug 01 '23 09:08 zenyr

When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.

However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.

In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.

Right now, I generate an image with the SDXL Base + Refiner models with the following settings:

MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6

at the following speed:

(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]

If anyone could do a run with the same settings and let me know their results, I'd be grateful.

*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run: pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

alessandroperilli avatar Sep 14 '23 21:09 alessandroperilli

Since I wrote the previous reply, I've experience even more erratic behavior: now, when I activate things like a LoRA or ControlNet, the generation time goes up dramatically and, it seems, in proportion to how many things I activate. If I turn on 1 LoRA and 3 ControlNets, I get something like 12min to generate 1 image.

It never happened before and, clearly, there's something wrong. It might be something I've done with my workflow, but it's odd.

@comfyanonymous, does ComfyUI have a debug flag where I can see everything happening in the terminal, like SD.next?

alessandroperilli avatar Sep 17 '23 17:09 alessandroperilli

+1

I'm getting between 1.5it/s and 3it/s running SDXL 1.0 base without refiner with --force-fp16 on M2 Max with 96GB RAM.

neilmendoza avatar Oct 06 '23 18:10 neilmendoza

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

rogueturnip avatar Oct 09 '23 22:10 rogueturnip

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

I'm running 13.6, Ventura.

neilmendoza avatar Oct 09 '23 22:10 neilmendoza

I'm curious what version of Mac OS you are running? I started using Comfy today because automatic1111 was crashing and it appears related to the Mac OS 14 Sonoma upgrade so I'm curious if this processing speed issue could also be related. Comfy isn't anywhere near as fast as what Automatic was before the crashing started.

14.0 (23A344), Sonoma.

alessandroperilli avatar Oct 09 '23 23:10 alessandroperilli

When I started using ComfyUI with Pytorch nightly for macOS, at the beginning of August, the generation speed on my M2 Max with 96GB RAM was on par with A1111/SD.Next. Progressively, it seemed to get a bit slower, but negligible.

However, at some point in the last two days, I noticed a drastic decrease in performance, as ComfyUI generated images in twice the time it normally would with same sampler, steps, cfg, etc.

In the attempt to fix things, I updated to today's Pytorch nightly* and the generation speed returned approximately the same I remembered.

Right now, I generate an image with the SDXL Base + Refiner models with the following settings:

MacOS: 13.5.1 (22G90) Base checkpoint: sd_xl_base_1.0_0.9vae Refiner checkpoint: sd_xl_refiner_1.0_0.9vae Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 Aesthetic Score: 6

at the following speed:

(Base) 100%|##########| 52/52 [03:31<00:00, 4.06s/it] (Refiner) 100%|##########| 18/18 [01:44<00:00, 5.83s/it]

If anyone could do a run with the same settings and let me know their results, I'd be grateful.

*To update your Pytorch nightly: exit ComfyUI, be sure you are still in the virtual environment, and then run: pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

thx a lot, I checked some solutions on the network. Today, I solved the problem of full comfy speed by upgrading the pytorch version, and upgraded the pytorch command of the local environment:

pip install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

After the upgrade: pytorch 2.2.0.dev20231009 python 3.11.4

my sys: macos Sonoma 14.0,MacBook pro m1, 16g, sd_xl_base_1.0_0.9vae & sd_xl_refiner_1.0_0.9vae

Image size:1024*1024

launching it with: --force-fp16 Generation speed: 100%|██████████| 20/20 [01:42<00:00, 5.12s/it] 100%|██████████| 20/20 [02:03<00:00, 6.15s/it]

guoreex avatar Oct 10 '23 06:10 guoreex

@guoreex can you run a generation with my same exact parameters and report your speed?

Image size: 1344x768px Sampler: DPM++ 2s Ancestral Scheduler: Karras Steps: 70 CFG Scale: 10 (Pos & Neg) Aesthetic Score: 6

(Your Base steps should be 75-80% of the total steps, leaving the remaining steps to the Refiner. So, in this example: 52 steps for the Base and 18 steps for the Refiner)

alessandroperilli avatar Oct 10 '23 10:10 alessandroperilli

M2 Pro, 16Gb, Sonoma 14.1.1 I didn't upgrade to the torch nightly, but confirm that --force-fp16 works. It takes ~180 sec to generate with 20 steps of SDXL base and 5 steps of refiner.

If I try with --use-split-cross-attention --force-fp16 it gets slower to ~200 sec

Omhet avatar Nov 28 '23 20:11 Omhet

M3 Max , 64gb, only 11gb taken.. GPU bound, using the SDXL turbo fp16, force fp16 I get 2.45 iter /s on the sample provided for turbo

In this video the guy gets 11 on a 3090 https://www.youtube.com/watch?v=kApJkjjIhbs

pechaut78 avatar Nov 29 '23 23:11 pechaut78

I just tried ComfyUI on my Mac and I was surprised by its slow results! Mac Os Sonoma 14.1.1, Mac mini M2 Pro, 16g ram, Run with "force-fp16", sd_xl_base_1.0, 1344x768px, DPM++ 2s Ancestral, Karras, steps 20, CFG 8 ~ 190 sec

TabassomArgi avatar Jan 14 '24 23:01 TabassomArgi

i am havng this problem TOO...

GeneralShan avatar Aug 24 '24 04:08 GeneralShan

Hello ! I try for severals day to run Wan2.2 on my mac with no results, just a lot of errors... I meant every steps its a erros. I just buy a Mac Studio with M3 Ultra with 512 Giga of Ram !!!!! but i just can't create...

legeekcestchic avatar Oct 23 '25 09:10 legeekcestchic

i try every templates based on Wan on ComfyUI but nothing work :(

legeekcestchic avatar Oct 23 '25 09:10 legeekcestchic