[Feature]: Coerce resolution to multiple of 112 for qwen-edit
Feature description
Qwen Image Edit optimally requires that image resolution in both dimensions is a multiple of 112, producing markedly distorted output otherwise
Note: there are multiple ways to do this: cropping, scaling, padding. Which one should be used?
Version Platform Description
No response
multiple of 112? can you quote the reference for that info? also that is a really big number. sdnext does coercing, but typically those are 8/16/32 tops so its always doing resize using lancosz method. but coercing to 112 could singnificantly change aspect-ratio of the image. and no, i don't want to have configurable "should we crop/scale/pad" just to deal with qwen-image-edit defficiencies.
multiple of 112? can you quote the reference for that info?
https://www.reddit.com/r/StableDiffusion/comments/1myr9al/use_a_multiple_of_112_to_get_rid_of_the_zoom/ Also has been mentioned a few times in Comfy's Discord server I tested it, and indeed, if the dimensions aren't multiples of 112, the output is very distorted
The number 112 indeed seems weird, but it's due to how Qwen Image VAE and Qwen2.5-VL-7B work
If this is not implemented, this anomaly of Qwen Image Edit must be mentioned in the wiki at least
ok, so 112 is common factor from x16 from vae and x14 from vlm so how do you propose to deal with this? sdnext already does resize to match vae, so x16 is handled. but like i said, coercing to 112 would most of the times be pretty intrusive to do it behind-the-scene.
I believe my comment from the other thread about Qwen-Image-Edit is accurate here as well.
https://github.com/vladmandic/sdnext/issues/4261#issuecomment-3401547904
yes - and that the latest writeup and link to diffusers issue does show more. question here is what we can do right now - do you have a proposal - i'm asking for it repeatedly?
You asked the other person, not me, I just wanted to chip in about the problems, as I haven't really thought hard about the solution for now.
But from my side I might advise caution about making changes to pipeline or handling in SDnext so that we don't end up accidentally obscuring or complicating issues as they become fixed upstream? I.e. leave a note on the wiki about problems with Owen at most, and just bear with until Diffusers changes get merged, and try to work with Diffusers team to iron out problems.
true - and thanks for the notes!
I think we indeed should mention this in the wiki (and create a wiki page for image editing models in the process) until the Diffusers implementation is fixed. Probably not worth making a mitigation on SD.Next side for it. Also, I tested a bit more and it's indeed more complicated than just 112, e.g. a 896*896 image also causes this bug, while a 1024*1024 image doesn't
i'm totally down for having a new wiki page for image editing models in general. did i hear someone volunteer to write one? :)