diffusers Create VAE feature extractor class

We currently do not have a unified API that makes sure that for pipelines that accept image inputs that image input and output format always stay the same. This PR shows that nicely: https://github.com/huggingface/diffusers/issues/1882#issuecomment-1416117217

We should make sure that:

a) All pipelines that accept images, can treat images of type PIL, numpy and torch
b) if images are passed in numpy or torch, then the input format (image scale) should match 1-to-1 the output format
c) Pipelines should be able to return images in PT format besides PIL and numpy so that one can run multiple image-to-image generations on GPU
d) there is a lot of boiler plate code around "preparing images" and "preparing masks" => we should unify this code in a feature extractor as it's usually pretty much always the same
e) Test that pipelines give the same results for all image inputs

This change will require to open a more involved PR, but it's time to tackle this! It would greatly help users that use img-2-img to make movies etc...

Feb 09 '23 11:02 patrickvonplaten

@williamberman @yiyixuxu @pcuenca let me know if such a PR interests you, otherwise I can try to tackle it in 1-2 weeks :-)

Feb 09 '23 11:02 patrickvonplaten

@patrickvonplaten I believe last time we talked about this, the idea was to use an image processing mixin that we would pull the common image processing code into. Is that still the case?

Feb 11 '23 18:02 williamberman

Well actually I don't think we have to use a Mixin class, think we could just create a "weight" and "config" less VAEFeatureExtractor class that is instantiated in all the respective pipelines under:

self.vae_extractor = VAEExtractor(image_size=..., ...)

Feb 13 '23 11:02 patrickvonplaten

ok cool, will put on TODO list but someone else feel free to pick up if have room :)

Feb 16 '23 00:02 williamberman

@yiyixuxu feel free to take over :-)

Mar 06 '23 20:03 patrickvonplaten

Being worked on in: https://github.com/huggingface/diffusers/pull/2617

Mar 09 '23 12:03 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 02 '23 15:04 github-actions[bot]

I think this is done.

Apr 03 '23 14:04 pcuenca

@pcuenca I think we still need to add the feature extractor class to other pipelines, no?

@yiyixuxu is that correct?

Apr 04 '23 17:04 williamberman