api-inference-community icon indicating copy to clipboard operation
api-inference-community copied to clipboard

Endpoint interface for inpainting models that require two images.

Open OrderAndCh4oS opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe. There doesn't appear to be an endpoint interface for Stable Diffusion inpainting models that require two image files, the base image and a mask.

Describe the solution you'd like It would be handy to have an interface for these models so that the Hosted inference API widget would work on the model card views.

Describe alternatives you've considered I recently had to create an endpoint for ControlNet inpainting: OrderAndChaos/controlnet-inpaint-endpoint. This was based on the philschmid/stable-diffusion-2-inpainting-endpoint endpoint.

Additional Context Originally asked here: https://github.com/huggingface/huggingface_hub/issues/1486

OrderAndCh4oS avatar May 30 '23 21:05 OrderAndCh4oS

@pcuenca @mishig25

This seems like a valid use case, wdyt ? Any models particularly fit for that ? Maybe we should consider some highly diffuser specific component maybe (akin to Adobe ?) that could be much more general (trying to limit the number of different widgets/pipelines)

AFAIK there's:

  • Inpainting
  • ControlNet (lots of possibles additional masks/information)
  • Outpainting
  • Prompt + Prompt parsing
  • Negative prompt

I'm not saying to say we should support everything, I'm listing things I'm aware of that could be nice to add so we can try to think a single (at most a pair) of widgets that try to handle as many cases as possible.

Narsil avatar Jun 05 '23 14:06 Narsil

There are some specialist in-painting models, but by and large most stable diffusion models are capable of many of those tasks @Narsil outlined, at least to some extent.

Perhaps we could generalize to some sort of text_plus_image_to_image task, where the nature of the input image depends on the particular model used (it could be a mask for in-painting, or a regular image for image-to-image generation, ...) Even so, it sounds tricky to cover all the flexibility used in methods such as ControlNet.

Is the final goal here a richer representation of ControlNet?

pcuenca avatar Jun 05 '23 15:06 pcuenca

Is the final goal here a richer representation of ControlNet?

I'm guessing the final goal is to showcase as best as possible what models can do. There's definitely a tradeoff between showcasing everything, and the bare minimum.

Currently it's the bare minimum lacking some potential to show off some super nice properties. Spaces can handle arbitrarily complex interfaces. I'm mostly raising the question so we can think if something better than the current widget is possible

Narsil avatar Jun 06 '23 14:06 Narsil

Yes, of course!

My question was more about what models should be tagged with these new widgets vs just with the text-to-image task. If I understand it correctly, the widget to show is determined based on the pipeline_task, so it's a 1-to-1 relationship, is that right?

cc @apolinario, he always has good ideas about these things

pcuenca avatar Jun 06 '23 15:06 pcuenca

, so it's a 1-to-1 relationship, is that right?

Indeed !

Narsil avatar Jun 07 '23 06:06 Narsil