Add some documentation to the StableDiffusionImg2Img Pipeline tensor input format
So I was trying to feed the stablediffusionimg2imgpipeline with a video frame turned into torch tensor instead of a PIL image as is typical, and I was... struggling with garbled output, until I eventually found this bit in the code: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L86
I think the docs should mention the raw input image tensor should be in the [-1.0:1.0] range, which isn't typical when (for instance) dumping floating point frames from numpy or vapoursynth.
Just a tiny thing. But maybe it will save someone else the trouble.
+1 on this one. Think we should definitely improve the documentation and the robustness of all models that accept images.
In short, pipelines that accept images should:
- Be allowed to accept images in multiple formats (PIL, numpy, torch.Tensor)
- Be tested that all inputs give the same results
- We should probably think about a feature extractor class that centralizes all the logic. Don't think we need to make it depend on a config yet, but adding a new class for it would make a lot of sense.
- Have very nice documentation about it as I think many people are using this class to generate movies, etc...
cc @patil-suraj @williamberman
Yeah +1, there are manual tests for the different input types for the unclip image variation pipeline. We could extract those to a mixin and re-use with some modifications.
https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L237
https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L281
https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L313
https://github.com/huggingface/diffusers/blob/a66f2baeb782e091dde4e1e6394e46f169e5ba58/tests/pipelines/unclip/test_unclip_image_variation.py#L365
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
We should solve this very soon, this is an important issue. This issue describes in more detail how to solve it: https://github.com/huggingface/diffusers/issues/2304
I will make this a priority for me next week.
cc @yiyixuxu here as it's related to #2304
Assigning @yiyixuxu here instead of me