diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Create VAE feature extractor class

Open patrickvonplaten opened this issue 3 years ago • 4 comments

We currently do not have a unified API that makes sure that for pipelines that accept image inputs that image input and output format always stay the same. This PR shows that nicely: https://github.com/huggingface/diffusers/issues/1882#issuecomment-1416117217

We should make sure that:

  • a) All pipelines that accept images, can treat images of type PIL, numpy and torch
  • b) if images are passed in numpy or torch, then the input format (image scale) should match 1-to-1 the output format
  • c) Pipelines should be able to return images in PT format besides PIL and numpy so that one can run multiple image-to-image generations on GPU
  • d) there is a lot of boiler plate code around "preparing images" and "preparing masks" => we should unify this code in a feature extractor as it's usually pretty much always the same
  • e) Test that pipelines give the same results for all image inputs

This change will require to open a more involved PR, but it's time to tackle this! It would greatly help users that use img-2-img to make movies etc...

patrickvonplaten avatar Feb 09 '23 11:02 patrickvonplaten

@williamberman @yiyixuxu @pcuenca let me know if such a PR interests you, otherwise I can try to tackle it in 1-2 weeks :-)

patrickvonplaten avatar Feb 09 '23 11:02 patrickvonplaten

@patrickvonplaten I believe last time we talked about this, the idea was to use an image processing mixin that we would pull the common image processing code into. Is that still the case?

williamberman avatar Feb 11 '23 18:02 williamberman

Well actually I don't think we have to use a Mixin class, think we could just create a "weight" and "config" less VAEFeatureExtractor class that is instantiated in all the respective pipelines under:

self.vae_extractor = VAEExtractor(image_size=..., ...)

patrickvonplaten avatar Feb 13 '23 11:02 patrickvonplaten

ok cool, will put on TODO list but someone else feel free to pick up if have room :)

williamberman avatar Feb 16 '23 00:02 williamberman

@yiyixuxu feel free to take over :-)

patrickvonplaten avatar Mar 06 '23 20:03 patrickvonplaten

Being worked on in: https://github.com/huggingface/diffusers/pull/2617

patrickvonplaten avatar Mar 09 '23 12:03 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 02 '23 15:04 github-actions[bot]

I think this is done.

pcuenca avatar Apr 03 '23 14:04 pcuenca

@pcuenca I think we still need to add the feature extractor class to other pipelines, no?

@yiyixuxu is that correct?

williamberman avatar Apr 04 '23 17:04 williamberman