Change step_offset scheduler docstrings
See https://github.com/huggingface/diffusers/discussions/6931 and https://github.com/huggingface/diffusers/issues/6068 for background
Right now it changes all the docstrings from
An offset added to the inference steps. You can use a combination of
offset=1andset_alpha_to_one=Falseto make the last step use step 0 for the previous alpha product like in Stable Diffusion.
to simply
An offset added to the inference steps.
Because that really shouldn't apply to any schedulers except DDIM, and in my opinion even that is arguable. set_alpha_to_one=True is basically final_sigmas_type="zero" for DDIM so I'm unsure why it was ever recommended to be disabled?
Whether its from this docstring or otherwise the official SDXL config incorrectly sets set_alpha_to_one=False despite using Euler where it doesn't apply resulting in DDIM performing unusably when inheriting the config. Rest in peace @bghira's early Terminus checkpoints.
"An offset added to the inference steps." isn't very descriptive, but I'm not entirely sure about the motivation behind this value so I'm leaving it as-is for now. Usually if you want to sample later timesteps you'd use a solution like timestep_spacing="trailing" instead.
cc @pcuenca and @patil-suraj here again to see if anyone knows why we recommended to use steps_offset=1 and set_alpha_to_one=False
see more https://github.com/huggingface/diffusers/discussions/6931
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
I almost wonder if it'd be worth to have an anti-recommendation?
Like for final_sigmas_type and set_alpha_to_one have a lil blurb that's something like
Only disable [this] if you're certain it's required.
Even if SAI never updates their goofy SDXL config, it might help people looking in the docs on why the outputs might be bad?
Between plain Epsilon, Turbo, V-Pred, and ZSNR I haven't observed a single model that would require clipping the final step instead of going straight for zero. Maybe that only applies to certain timestep spacings..?
I've been checking the history a bit and it look like steps_offset was formally introduced in the configuration in https://github.com/huggingface/diffusers/pull/479, motivated by https://github.com/huggingface/diffusers/issues/465. In particular, the different scheduling configurations (at the time) were abstracted to those two properties as summarized by Patrick in this comment: https://github.com/huggingface/diffusers/issues/465#issuecomment-1243702538. So the use of set_alpha_to_one=False and steps_offset=1 was required for Stable Diffusion.
In #3947, I think we just went through other schedulers and replicated the same logic. IIRC we checked results against the SDXL implementation, but I'm not completely sure if we used DDIM or Euler as a baseline (I think it was DDIM and then changed to Euler).
So the use of set_alpha_to_one=False and steps_offset=1 was required for Stable Diffusion.
Was, as in it no longer does at all? Because I can't find a purpose to either property.
I ran every combination of steps_offset, set_alpha_to_one, and timestep_spacing on both SDXL and SD1.5. All alpha=False seems to do is just make the image noisier which is what you'd expect since the final sigma isn't zero.
Demo with as high of JPG quality as GitHub will allow. You can particularly see the unsampled noise left over in the flat blue sky on any image where set_alpha_to_one=False
For reproduction:
-
colorful vector art of a fennec fox on mars -
blurry, noisy, cropped - seed
1 - steps
30 - guidance
5 - rescale
0 - noise added using f32 on CPU
@yiyixuxu WDYT of merging this and letting the discussion continue?
@Beinsezii Thanks a lot for the tests! I don't have an answer unfortunately, maybe a bug was introduced after those PRs, or maybe the problem was always there from the original stable diffusion codebase (I'm pretty sure we compared model outputs when the conversion was made), or maybe the logic in that PR was flawed after all. I did browse the current DDIM code vs the version in the PR and nothing stood out as being obviously wrong. Perhaps we could run generations immediately before and after the change in the PR and see if anything changes. I'd love to help with that, but I'm very time-constrained this week :(
Regarding the doc changes in this PR, I'm supportive if you think they'll help reduce the confusion. It'll be a bit puzzling for new users, perhaps we can improve it a bit without mentioning set_alpha_to_one.
Another idea would be to go back to checking against the reference CompVis codebase, comparing outputs. Our guiding light for integration was to generate 1-to-1 identical results (on CPU using float32), and I think that's the goal we should still have as a library. The test suite should ensure that continues to be the case, though (but it might be imperfect).
The CompVis repo has a bunch of pinned dependencies that don't work on my GPU so I can't check that myself.
Maybe something like ComfyUI could also be a decent benchmark? I believe SAI said they were using it for their Discord image gen bot during the SDXL trial period.
@yiyixuxu do we wanna merge it?
@patrickvonplaten maybe we want to update the SAI configs on the hub once this is merged? it's not blocked by it, this is just a documentation fix. but training / inference code still inherits the bad config.