[Refactor] Better align `from_single_file` logic with `from_pretrained`
What does this PR do?
Single file loading is still far from ideal. We should aim to align the behaviour as close to from_pretrained as possible (with the goal of converging to a single loading method)
Some of the issues that need to be addressed
- Different ways of setting model/pipeline configurations. We currently support arguments such as num_in_channels, scheduler_type , load_safety_checker which are not supported in the diffusers model configs or in from_pretrained. We should deprecate these in favour of using the same configuration override methods that we use in from_pretrained
- Configuring model parameters with heuristics based on the pipeline name, e.g. Setting the number of In Channels for a pipeline based on the invoking class. This isn't great because we should be able to configure a model based on the information within the checkpoint alone.
- Loading pipeline components is still quite rigid and relies a lot of heuristics to fetch/load the components. We should be able to reuse model_index.json files to determine the correct classes for each component in the pipeline
- Not respecting the configured scheduler type in the model_index.json file on the hub. Single file currently defaults to always using the DDIM scheduler
This PR attempts to get single file loading behaviour much closer to the logic used in from_pretrained
For Models
- It pushes the model loading logic into a FromOriginalModelMixin that fetches the appropriate model config.json file from the hub based on the keys provided in the checkpoint
- Defines model specific mapping functions that converts the original state dict to a diffusers state dict.
- Apply this
OriginalModelMixinto all Diffusers models that are meant to support from_single_file loading
This allows us to rely on a lot of the functionality already defined in ModelMixin and DiffusionPipeline to create the correct model
For Pipelines
- Rather than relying on the original yaml files to configure the Pipeline and Models, we should opt to identify the appropriate model repo related to the single file checkpoint based on the checkpoint keys. This allows us to fetch the
model_index.jsonfile for this checkpoint and load the components using similar logic tofrom_pretrained. Note loading a Pipeline/Model via YAML is still supported, it is just not the default anymore. - Allow passing in
local_dirandlocal_dir_use_symlinksarguments to control where checkpoints are downloaded and to disable symlinking if users request it.
TODO:
- General clean up
- Some clean up in
single_file_utils - Move the Cascade loading logic to follow this system
- Add tests to ensure this isn't backwards breaking
- Improve the docs a bit more to demonstrate single file functionality.
- Look into supporting connected pipelines via this approach.
This should make it a bit easier to add more Pipelines/Models with single file support. The process would become
- Define a mapping function for the model from the original dict to diffusers dict
- Create a model repo with the config files
- Find a way to infer the model type from the checkpoint and fetch the appropriate config
We already have to take care of steps 1 and 2 when adding a model to diffusers. So it's just a matter inferring the proper config from the checkpoint (not easy but very possible).
Fixes # (issue)
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline?
- [ ] Did you read our philosophy doc (important for complex PRs)?
- [ ] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
love it!
we should opt to identify the appropriate model repo related to the single file checkpoint based on the checkpoint keys. This allows us to fetch the model_index.json file for this checkpoint
this is great, just lets ensure that configs location can be a local folder as well as huggingface.co itself as there are plenty of users without internet connectivity (corp firewalls, great wall of china, etc.) (in that case, it would be up to app to have all config files available offline)
Allow passing in local_dir and local_dir_use_symlinks arguments to control where checkpoints are downloaded and to disable symlinking if users request it.
even better! if model is stored without symlinks, would it still follow hash-as-snapshot-folder-name naming or can we make it so downloaded model actually looks like model on huggingface?
Update on from_single_file progress
The new version of single file will always try to infer a default pipeline config given a checkpoint. This is reasonable since the single file checkpoint we support are usually finetunes of models on the hub that have a corresponding diffusers pipeline config.
Note When determining/fetching default pipeline configs, diffusers will not download any weights. Just the config files. The only exception is if the pipeline has a component that doesn't support single file loading or if the single file checkpoint doesn't contain the necessary weights for a component e.g. Safety Checker
from_single_file also supports passing in a diffusers repo id or local path to a diffusers model repo to configure the pipeline. This is useful in cases where a single file checkpoint might be released with slight modifications to model components and diffusers hasn't updated the library to auto detect it. e.g playground v2.5 uses a different scaling factor in the VAE. In a such a case, it would be possible to still correctly configure the pipeline using
from diffusers import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline("https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic/blob/main/playground-v2.5-1024px-aesthetic.fp16.safetensors", config="playgroundai/playground-v2.5-1024px-aesthetic")
Loading with the original YAML configs is also supported to maintain backwards compatibility. However, diffusers will still attempt to determine the pipeline config and fetch the model_index.json file for the pipeline to determine which model objects to load unless local_files_only=True in which case diffusers will try to infer the model objects based on the type hints provided in the pipelines. This is not as reliable as providing a path to a local diffusers model repo to configure the pipeline.
Docs have been updated to reflect these changes and provided updated guidance on from_single_file usage.
Additionally, this PR attempts to start deprecating some anti-patterns we supported in the past. This PR will not introduce any breaking changes, but will start deprecating these anti patterns. These include
- Allowing the UNet and VAE models to be configured via from_single_file using parameters like num_in_channels and image_size this is not something we support in from_pretrained since we recommend configuring the model directly and passing it in as as component.
- Allowing schedulers to be set with the scheduler_type argument. We generally recommend creating a scheduler and passing it in as a component.
Some final things to address:
- For the CLIP models, we originally downloaded their configs to the cache under their transformers model repo name rather as a subfolder of the diffusion model. e.g. SD1.5's single file checkpoint would place the CLIP text encoder config under
openai/clip-vit-large-patch14rather than underrunwayml/stable-diffusion-v1-5/text_encoder. This is will be an issue for anyone who is runningfrom_single_filewithlocal_files_only=Trueand updates Diffusers after this PR is merged, since we will now place the CLIP configs in a subfolder under the inferred model config for the pipeline.
To correct for this, we will also look for the CLIP configs in openai/clip-vit-large-patch14 if we detect original_config and local_files_only have been set. But the plan will be to deprecate this behaviour.
-
scheduler_typeargument is ignored currently in the PR. But I will update it to work and move into a deprecation cycle. -
Ensure all new suite of single file slow tests are all passing.
@DN6 some lingering questions:
from_single_file also supports passing in a diffusers repo id or local path to a diffusers model repo to configure the pipeline. This is useful in cases where a single file checkpoint might be released with slight modifications to model components and diffusers hasn't updated the library to auto detect it. e.g playground v2.5 uses a different scaling factor in the VAE. In a such a case, it would be possible to still correctly configure the pipeline using
So, the assumption here is that we already have the original single-file checkpoint converted to the diffusers format, right? This is a fair assumption because we will have the diffusers format anyway during the conversion process. Just want to confirm the reasoning.
Allowing the UNet and VAE models to be configured via from_single_file using parameters like num_in_channels and image_size this is not something we support in from_pretrained since we recommend configuring the model directly and passing it in as as component.
Are we going to attempt to infer those parameters from the provided checkpoints?
We generally recommend creating a scheduler and passing it in as a component.
Where do we recommend those? For the most part, people load a pipeline without configuring a scheduler. Could you clarify?
To correct for this, we will also look for the CLIP configs in openai/clip-vit-large-patch14 if we detect original_config and local_files_only have been set. But the plan will be to deprecate this behaviour.
I think we can just throw a message to the users to make them aware of what's happening if this is important enough know? We are essentially moving a cache location. IIRC transformers was doing something similar too.
scheduler_type argument is ignored currently in the PR. But I will update it to work and move into a deprecation cycle.
Aren't we allowing this too? "Allowing schedulers to be set with the scheduler_type argument." What will we be deprecating here?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@vladmandic I think this is almost ready to merge, if you have the time would you be able to test the branch to see if any breaking changes occur. I tried to make sure we have full backwards compatibility, but given the number of changes here, I just want to be on the safe side.