diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

method to prepare offline cache

Open keturn opened this issue 3 years ago • 7 comments

.from_pretrained() can work in offline mode by loading from the cache, but we lack a method to explicitly populate that cache.

I'd like something along the lines of .cache_pretrained_for_offline() that fetches all the files necessary for .from_pretrained() but doesn't actually load the weights.

Use cases would include:

  • something you do in an installation step or a "prepare for offline use" action to avoid loading delays later in the application, or in anticipation of network access becoming unavailable.
  • preparing an environment (archive, container, disk image, etc) on a low-resource machine that will then be copied over to the high-spec machine for production use.

It should be able to run without a GPU (or other intended target device for the model) or heaps of RAM.

The advantage of populating the huggingface_hub cache with the model instead of saving a copy of the model to an application-specific local path is that you get to share that cache with other applications, you don't need any extra code to apply updates to your copy, you don't any switch to change from the default on-demand loading location to your local copy, etc.

keturn avatar Nov 15 '22 22:11 keturn

Hey @keturn,

Would it then maybe not just better to directly use the hugging face hub for this e.g.:

from huggingface_hub import snapshot_download

folder = snapshot_download("CompVis/stable-diffusion-v1-4")

See docs here: https://huggingface.co/docs/huggingface_hub/main/en/package_reference/file_download#huggingface_hub.snapshot_download

Not 100% sure if I understand your use case here perfectly - let me know if you were looking for another solution!

patrickvonplaten avatar Nov 18 '22 12:11 patrickvonplaten

That, but there are a bunch of additional lines in Pipeline.from_pretrained to set up the allow_patterns and ignore_patterns, and maybe some additional stuff for Custom Pipelines. And that's just for Pipelines, idk if the other objects with the from_pretrained interface have their own quirks like that too.

keturn avatar Nov 18 '22 19:11 keturn

I see,

Hmm I'm currently not sure if it's worth adding a new function for this as it would force us to make a difficult design decision now ... -> do you think for now it'd be possible to just copy those lines of code?

It's a bit of an edge case for me at the moment - also considering that we don't have that kind of functionality yet for the much more used transformers code base: https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py

patrickvonplaten avatar Nov 20 '22 19:11 patrickvonplaten

I can do that as a stopgap. This isn't on the "must have" list for the initial InvokeAI integration. But obviously "copy and maintain some code that involves internal details of model distribution that are usually encapsulated away" is not something anyone wants to do long-term.

I didn't expect this to come with a difficult design decision! But I guess it does involve settling on a new method name for the public API, and naming things is always hard. 😅

keturn avatar Nov 20 '22 23:11 keturn

Just to better understand why doesn't it make sense to use the whole from_pretrained(...) function here? Is it just so that weights are not unecessarily loaded into RAM or also other reasons? I guess it's also because you don't want to be dependent on the token later down the road?

Could another approach be maybe to just do:

target_dir = "path/to/target/dir"
DiffusionPipeline.from_pretrained("....", cache_dir=target_dir)

This way the model will be loaded once into CPU but then removed via the garbage collector since the output of from_pretrained(...) is no where saved.

Overall, would it help to try to convince the authors of stable diffusion to remove the "Request Access" mechanism so that you don't have to rely on special loading logic?

patrickvonplaten avatar Nov 21 '22 11:11 patrickvonplaten

Is it just so that weights are not unecessarily loaded into RAM

Mostly this, yes. It's a lot of RAM.

And also when you try to load fp16 weights without a CUDA device, it spams a lot of verbose warnings (probably one for each sub-model with weights) that I haven't found a clean way to suppress.

would it help to try to convince the authors of stable diffusion to remove the "Request Access" mechanism

That'd be great for other reasons we've discussed, but is only tangentially related here. We have a requirement to make a self-contained installation for offline mode regardless of license.

keturn avatar Nov 21 '22 19:11 keturn

I see! This PR could help a bit: https://github.com/huggingface/diffusers/pull/1450

However it still forces one to load the model into RAM but doing something like:

from diffusers import DiffusionPipeline
import gc

_, local_path = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2", return_cached_folder=True)
gc.collect()

would be a simple fix for now. In the future we could factor out the whole downloading function. But since this function is still very prone to change and I don't see the use case of 0-RAM downloading the models as very important at the moment, I'd prefer to have one long, readable from_pretrained function for now.

patrickvonplaten avatar Nov 28 '22 12:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 22 '22 15:12 github-actions[bot]