Endpoint interface for inpainting models that require two images.
Is your feature request related to a problem? Please describe. There doesn't appear to be an endpoint interface for Stable Diffusion inpainting models that require two image files, the base image and a mask.
Describe the solution you'd like It would be handy to have an interface for these models so that the Hosted inference API widget would work on the model card views.
Describe alternatives you've considered I recently had to create an endpoint for ControlNet inpainting: OrderAndChaos/controlnet-inpaint-endpoint. This was based on the philschmid/stable-diffusion-2-inpainting-endpoint endpoint.
Additional Context Originally asked here: https://github.com/huggingface/huggingface_hub/issues/1486
@pcuenca @mishig25
This seems like a valid use case, wdyt ? Any models particularly fit for that ? Maybe we should consider some highly diffuser specific component maybe (akin to Adobe ?) that could be much more general (trying to limit the number of different widgets/pipelines)
AFAIK there's:
- Inpainting
- ControlNet (lots of possibles additional masks/information)
- Outpainting
- Prompt + Prompt parsing
- Negative prompt
I'm not saying to say we should support everything, I'm listing things I'm aware of that could be nice to add so we can try to think a single (at most a pair) of widgets that try to handle as many cases as possible.
There are some specialist in-painting models, but by and large most stable diffusion models are capable of many of those tasks @Narsil outlined, at least to some extent.
Perhaps we could generalize to some sort of text_plus_image_to_image task, where the nature of the input image depends on the particular model used (it could be a mask for in-painting, or a regular image for image-to-image generation, ...) Even so, it sounds tricky to cover all the flexibility used in methods such as ControlNet.
Is the final goal here a richer representation of ControlNet?
Is the final goal here a richer representation of ControlNet?
I'm guessing the final goal is to showcase as best as possible what models can do. There's definitely a tradeoff between showcasing everything, and the bare minimum.
Currently it's the bare minimum lacking some potential to show off some super nice properties. Spaces can handle arbitrarily complex interfaces. I'm mostly raising the question so we can think if something better than the current widget is possible
Yes, of course!
My question was more about what models should be tagged with these new widgets vs just with the text-to-image task. If I understand it correctly, the widget to show is determined based on the pipeline_task, so it's a 1-to-1 relationship, is that right?
cc @apolinario, he always has good ideas about these things
, so it's a 1-to-1 relationship, is that right?
Indeed !