diffusers Add some documentation to the StableDiffusionImg2Img Pipeline tensor input format

So I was trying to feed the stablediffusionimg2imgpipeline with a video frame turned into torch tensor instead of a PIL image as is typical, and I was... struggling with garbled output, until I eventually found this bit in the code: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L86

I think the docs should mention the raw input image tensor should be in the [-1.0:1.0] range, which isn't typical when (for instance) dumping floating point frames from numpy or vapoursynth.

Just a tiny thing. But maybe it will save someone else the trouble.

Jan 22 '23 04:01 brucethemoose

+1 on this one. Think we should definitely improve the documentation and the robustness of all models that accept images.

In short, pipelines that accept images should:

Be allowed to accept images in multiple formats (PIL, numpy, torch.Tensor)
Be tested that all inputs give the same results
We should probably think about a feature extractor class that centralizes all the logic. Don't think we need to make it depend on a config yet, but adding a new class for it would make a lot of sense.
Have very nice documentation about it as I think many people are using this class to generate movies, etc...

cc @patil-suraj @williamberman

Jan 23 '23 08:01 patrickvonplaten

Yeah +1, there are manual tests for the different input types for the unclip image variation pipeline. We could extract those to a mixin and re-use with some modifications.

https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L237

https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L281

https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L313

https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L365

Jan 23 '23 19:01 williamberman

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 21 '23 15:02 github-actions[bot]

We should solve this very soon, this is an important issue. This issue describes in more detail how to solve it: https://github.com/huggingface/diffusers/issues/2304

I will make this a priority for me next week.

Mar 02 '23 18:03 patrickvonplaten

cc @yiyixuxu here as it's related to #2304

Mar 07 '23 09:03 patrickvonplaten

Assigning @yiyixuxu here instead of me

Apr 11 '23 18:04 patrickvonplaten