diffusers Pipeline proposal: Depth2ImgVariation

Model/Pipeline/Scheduler description

It would be nice to have a pipeline that combines the StableDiffusionImageVariationPipeline with the StableDiffusionDepth2ImgPipeline, i.e. a pipeline that creates an image from a depth map, but guided by image embeddings rather than a prompt.

Open source status

[ ] The model implementation is available
[ ] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

The use-case would be to provide a style image and a depth image to produce images with the content of the depth image, but the style of the style image.

While the code of both pipelines seems to be pretty easy to merge, I'm wondering:

Would the concept work?
Is there a clever way around retraining?

Feedback is very appreciated and if the approach is promising, I would be happy to try myself on an implementation.

Thanks!

Jan 13 '23 17:01 maxfrei750

Sounds like a cool project for the community! Anybody interested in taking this one?

Jan 16 '23 11:01 patrickvonplaten

@patrickvonplaten Thanks for the kind words. So you think that it should work as imagined?

Concerning the implementation: I'd be happy to write the code, which should be just a combination of the StableDiffusionImageVariationPipeline with the StableDiffusionDepth2ImgPipeline, right?

My main concern is the training, since I lack access to the required hardware and don't know if/which strategies can be applied to avoid a full re-training. Do you have any guidance, e.g., if only parts of the weights need to be refined?

Jan 16 '23 11:01 maxfrei750

Hey @maxfrei750,

Sure, I think there is no reason why this shouldn't work. All you will have to fine-tune is the UNet (no text encoder etc...)

Jan 16 '23 14:01 patrickvonplaten

Also see: https://github.com/justinpinkney/stable-diffusion#image-variations

Jan 16 '23 14:01 patrickvonplaten

Hey @patrickvonplaten, thanks for the information. I was also looking around for a training script for Stable Diffusion with the depth conditioning, to adapt it or at least see exactly how they did it. Unfortunately, I didn't find anything. Are you aware of such a resource?

Jan 21 '23 12:01 maxfrei750

Hmm not really sadly. Could you maybe try to ask on our discord? https://discord.gg/G7tWnz98XR

Jan 23 '23 07:01 patrickvonplaten

I'll try. Thanks for the hint!

Jan 23 '23 08:01 maxfrei750

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 16 '23 15:02 github-actions[bot]