diffusers New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots

Hey AI Trainers and other peculiarities.

This is my first PR to Diffusers (I think, at least the first with an actual new feature). It's an upscaling feature that trades VRAM for compute and allows virtually infinite upscaling, being able to go as big as 8k in a matter of minutes on a single 3080 with little noticeable artifacts.

Started as a research project with its own model, I managed to port this feature to Stable Diffusion 2 upscaling, with very cool results in return.

It was first introduced in my own Discord bot as a testing feature, but in the meantime I learned how pipelines worked and to my best knowledge I tried to make it available for Diffusers.

This is my first pipeline, so please be gentle with reviews and feedback, I'm willing to learn to contribute in a more standard fashion, so I'll apply any feedback that comes from other people that makes sense.

The main thing I'm uncertain about is that the contribution guidelines state:

Self-contained: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file itself, should be inherited from (and only from) the DiffusionPipeline class or be directly attached to the model and scheduler components of the pipeline.

Considering my code borrows a lot from the original upscaling code, I could either copy the code and add my features from there, or simply refer to pipeline_stable_diffusion_upscale. I decided to do the latter it was less clunky and faster to do.

Because of the aforementioned lack of skill in contributing pipelines, as well as possible changes regarding the pipeline's "self-contained-ness", I consider this PR an ongoing discussion.

The pipeline contains a __main__ entry point that can be called through CLI for a demo. The example code is:

model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipe = StableDiffusionTiledUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
image = Image.open("../../docs/source/imgs/diffusers_library.jpg")

def callback(obj):
    print(f"progress: {obj['progress']:.4f}")
    obj['image'].save("diffusers_library_progress.jpg")

final_image = pipe(image=image, prompt="Black font, white background, vector", noise_level=40, callback=callback)
final_image.save("diffusers_library.jpg")

I'm looking forward to feedback, and I hope I made something that could benefit others, too.

With that, let's take a look at some demo art:

Upscaled docs/source/imgs/diffusers_library.jpg:

diffusers_library

"the legendary RockBeard, legend of a pirate, realistic portrait"

"Christmas tree sandwich, Subway, centered, realistic food photo, kodak ektar"

"rocky hot springs, waterfall, tropical, cobblestone path, blue lagoon nature, realistic photo, kodak ektar, clear, glistering water, volumetric lighting, raytraced water, bananas, apples, fresh"

"black smoke escaping out of a human mouth, diesel exhaust realistic portrait"

And my favorite and the first ever made with this algorithm, a grilled dragon:

"Grilled dragon, served on a plate, realistic food photo, kodak ektar, Michelin Star dish, centered"

Dec 08 '22 18:12 peterwilli

The documentation is not available anymore as the PR was closed or merged.

Dec 08 '22 18:12 HuggingFaceDocBuilderDev

Hey @peterwilli,

This looks super cool! Would you mind maybe adding your pipeline to the official table and a pipeline example here: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#community-examples

This would greatly help the community to use your pipeline :-)

Dec 12 '22 15:12 patrickvonplaten

Hey @peterwilli,

This looks super cool! Would you mind maybe adding your pipeline to the official table and a pipeline example here: https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#community-examples

This would greatly help the community to use your pipeline :-)

Hey @patrickvonplaten, thanks for the kind words! I'm currently in the process of making these examples, but I'm stuck with how to preload my pipeline. I'm getting strange errors about shutil and I'm wondering if you can help me out.

I have a colab here: https://colab.research.google.com/drive/1Zlvi64ZkQUarqiAFzyywPRSLUogbH8Xd?usp=sharing

Thanks in advance.

Dec 15 '22 18:12 peterwilli

Hey @peterwilli,

Sure, I think if you want to use the "native" upscaler pipeline you can just do:

diffuser_pipeline = StableDiffusionUpscalePipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler",
    torch_dtype=torch.float16,
)

instead of:

diffuser_pipeline = StableDiffusionUpscalePipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler",
    custom_pipeline="./diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_upscale.py",
    revision="fp16",
    torch_dtype=torch.float16,
)

Dec 19 '22 16:12 patrickvonplaten

diffuser_pipeline = StableDiffusionUpscalePipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler",
    torch_dtype=torch.float16,
)

But this doesn't use the pipeline then, right? I feel I misinterpreted how to apply this. Thanks for the help by the way!

Dec 20 '22 15:12 peterwilli

@patrickvonplaten sorry for the ping, I was wondering if you saw it... And happy new year (in 1 day!)

Dec 30 '22 14:12 peterwilli

Happy new year @peterwilli - sorry for being so late here!

Jan 04 '23 22:01 patrickvonplaten