For Qwen Image Edit Previews Example Workflow Needed
Hi there @blepping...
I just saw you node pack. This seems like a great work as I saw the nodes and you bring advanced stuff to the table. Thanks for these...
I saw your reply here... https://github.com/madebyollin/taesd/issues/32
Please can you provide an example workflow to achieve this?
I mean to get live and high quality previews with Qwen Image Edit. I am lost about if I shall use ksampler and connect something to it ,or we need some custom sampler etc...
Can you help me or us as people to understand this and a workflow or node connections to be used...
I have downloaded the tae smaller vae and put them in the folder. I am lost...
Another question is if suitable,is there any workaround for qwen image edit to use sage attention without producing "black" images that you know of?
Please can you provide an example workflow to achieve this?
There's actually no workflow to show for that feature, if you have ComfyUI-bleh installed then it should take over previewer functionality automatically.
- Make sure the node pack actually loaded successfully - no errors related to ComfyUI-bleh in your startup messages and you should be able to add the nodes documented in the main README to your workflow.
- Make sure your preview type is set to
taesdrather thannoneorlatent2rgb. This is a ComfyUI setting, not one specific to this nodepack. If you use the ComfyUI Manager, I believe it will let you set the preview type. - Some other node packs might also try to overwrite the default previewer, unfortunately if that's the case then you need to choose which one you want to use. The only one that does this that I'm aware of is VideoHelpSuite if you have animated previews turned on.
Another question is if suitable,is there any workaround for qwen image edit to use sage attention without producing "black" images that you know of?
Hmm, I've never seen that (but I've only used normal Qwen Image, not the edit version). Are you using Sage through my nodes (BlehSageAttentionSampler)? Just a random guess, but maybe it doesn't like the dtype you're running the model in. ComfyUI has a builtin ModelComputeDtype node that can be used to override whatever default dtype it uses. You are most likely using either float16 or bfloat16 (should be some console messages about what it uses when you load the model or first use it) so you could try setting that to the opposite. Note that this might help you use SageAttention but ComfyUI usually chooses what it thinks is best for your GPU and the model so it could be worse overall performance to do that.
Thank you for the detailed reply. I will try to set it up as you described. Will see what comes out of it if I can make it correct.
No problem. By the way, I just tried Qwen Image Edit, so I can confirm that works fine. Also, using BlehSageAttentionSampler does not seem to cause any issues (you'll need to use SamplerCustom or SamplerCustomAdvanced for sampling, or any sampling node that has SAMPLER input). The black image problem might be because there's other stuff like text encoders that don't like Sage. One nice thing about using it to wrap a sampler is you can control exactly where and where it gets used which makes it less likely to run into those sorts of problems.
This is the basic idea:
If you don't want to use CFG (CFG 1), you can use BasicGuider instead.
Thank you for your detailed guidance. I will recheck the read me first then I will try to replicate building a similar workflow just to get things going first.
I have made the config with yaml and json but yaml was used as you suggested. I first tried ksampler but it gave an error. Then I applied your preview patch to ksampler it forced the ksampler to use proper previewing and got rid of error.
After the first successful try I decided to use custom sampler with sage attention patch with the new released qwen image edit 2509 nunchaku version. Ohh boi! Even without lightning lora which is not implemented yet for nunchaku version (awaiting for PR to be merged for lora support),it was extremely fast.
console output for 8 steps with 3090: loaded completely 21460.64670448303 7909.737449645996 True Requested to load NunchakuQwenImage Refreshing previewer 100%|████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00, 1.95s/it] Requested to load WanVAE loaded completely 600.0444679260254 242.02829551696777 True Prompt executed in 33.60 seconds
So if the 4 steps lora is available,it will be even faster.
Just want to say thank you for your node pack and I will try some other nodes that your node pack offers. Also the previews are working amazing.
Edit : Just wanted to share 1 result as I am very impressed how accurate Qwen Image Edit model is following prompts: Prompt : make her wear glasses and change her hair and eyebrow color to blue. change her hairs style to cyberpunk raven hair style. remove collar on her neck. put her in a distance and she is riding a mechanical horse made of cogs.futuristic clockwork city background. make the image look photo realistic. slight depth of field. change the horse head with a dragon head.
Nice, glad it's working for you now! I've only had a little time to play with it but I was impressed as well.
If you're using Nunchaku then the SageAttentionSampler node probably isn't have an effect. It only does something if ComfyUI's default attention function gets called and I am guessing Nunchaku probably wouldn't be calling that. I haven't looked at the Nunchaku code though, so it is possible. One way to verify this would be to attach a text widget to the yaml_parameters input and turn on verbose mode:
It should spam your console with stuff about SageAttention while you run the model if it's actually getting used. You don't have to care about the content, just seeing any output from that would indicate that Nunchaku actually is using ComfyUI's attention mechanism.
(You don't need to use a multiline text node there since it's only one line of config parameters, so something like ComfyUI's builtin string primitive node will work.)
Ahh. Actually I got so happy that the preview is working and I rushed to make a conclusion that sage attention is working. What made me think that nunchaku's own crazy inference speed. My global launch script was set to use flash attention. So with flash attention it the preview was actually working but partial updating. With your node pack I now get very high quality representation and it updates the preview in steps.
So with global flash attention + your pack = good With global sage attention +- (with or without your node pack) +- sampler sage attention patch = no preview (complete dark result) and also no output(complete dark result). With flash attention + sampler sage attention = gives preview and result Only flash attention = gives preview and result
So sage attention patch possibly not working with nunchaku.
I asked gemini with deep research about this and it said qwen image edit is incompatible with sage attention. BTW Qwen image edit 2509 version got an update of lighting lora baked in from nunchaku team.
It even got crazier speed boost with 4-8 steps with rank 32-128. They will soon release complete lora support by the end of this month as an ETA. Pretty unique model which even understands controlnet images for pose estimation surpassing closed source models in terms of ability. Just for your information.
I really need to dig into your node pack as it offers really good stuff from the readme I have read. It will take sometime to understand the logic behind the nodes.
I may ask some other questions in the future if you don't mind. I will not ask until I really got stuck as I don't want to waste your time.
Q : Just for brain storming, do you think that is it possible to use a refiner model like using nunchaku version of Qwen Image Edit model as a base and refiner as nunchaku version of Sdxl model. Different architecture, different vae etc.. Or even let's say we do 10 steps in total, 8 steps from qwen, last 2 steps from Sdxl?(this sounds so dumb I know like jumping from 1 latent space to another) Of course we can do image to image piped in at the end...
A : My question got answered : That's an excellent and insightful question that gets to the heart of some advanced concepts in generative AI. Your "dumb" question is not dumb at all—it's the critical challenge in what you're proposing. Here's a breakdown of the possibilities and challenges of combining models like Qwen Image Edit and SDXL.
The Refiner Model Concept
Using a base model and a refiner is a common and powerful technique. The standard SDXL workflow, for instance, often uses a base model for the initial generation and then a separate refiner model to add high-frequency details. So, your core idea is sound.
However, the challenge, as you correctly identified, lies in the fact that you're proposing to use two models with different architectures, VAEs, and latent spaces.
The Latent Space Problem: The Core Challenge
Your intuition about "jumping from one latent space to another" is spot on. Here's why that's problematic:
Incompatible Latent Spaces: Diffusion models don't work on pixels directly during the generation steps. They operate in a compressed "latent space." The Qwen model and the SDXL model will have different VAEs (Variational Autoencoders) and therefore, fundamentally different and incompatible latent spaces.
The 8 + 2 Step Idea: You can't simply run 8 steps with Qwen and then pipe the resulting latent representation into SDXL for the final 2 steps. The data structure and "language" of the latent space would be completely different, and SDXL wouldn't know how to interpret it. It would be like writing the first eight chapters of a book in English and the last two in a completely different, unrelated language.
The Viable Solution: Image-to-Image Pipeline
Your final thought is the most practical and widely used solution: an image-to-image pipeline. This approach would work as follows:
Generate the Base Image: You would use the Qwen Image Edit model to generate the initial image, taking advantage of its strengths.
Use Image-to-Image with the Second Model: You would then take the fully generated image from Qwen and use it as the input for an image-to-image task with the SDXL model. This would allow SDXL to refine the image, adding its own stylistic characteristics.
This method works because the models are communicating via pixels, which is a universal format, rather than through their internal, model-specific latent representations.
Nunchaku Versions
The "nunchaku" versions of these models are quantized, meaning they are optimized for faster and more memory-efficient inference. This doesn't change the fundamental architectural differences between the models, but it does make running complex pipelines with multiple models more feasible on consumer hardware.
In summary, while your idea of switching models mid-inference is not currently feasible due to the incompatibility of their latent spaces, your overall brainstorming is heading in the right direction. The concept of using different models sequentially to leverage their unique strengths is a powerful one, and the image-to-image pipeline is the standard and effective way to achieve it.
It should spam your console with stuff about SageAttention while you run the model if it's actually getting used. You don't have to care about the content, just seeing any output from that would indicate that Nunchaku actually is using ComfyUI's attention mechanism.
It did not show any output in the console.