Update to also support newer HF cache format

Open thesephist opened this issue 3 years ago • 0 comments

Huggingface Transformers has a new cache format that looks like this:

$ ls ~/.cache/huggingface/hub/
/Users/thesephist/.cache/huggingface/hub/
├── models--EleutherAI--gpt-j-6B
│   ├── blobs
│   │   ├── 22fabbdda08346a6dfb95b1782a4efb6f876f2c2
│   │   ├── 47ffebc226205cbdaf3d3047c0b7f64b67620deb
│   │   ├── 6636bda4a1fd7a63653dffb22683b8162c8de956
│   │   ├── 84ef7fb594b5c0979e48bdeddb60a0adef33df0b
│   │   ├── a9d7d93cc226c6364c7e1c58b3a56de9327080cb
│   │   └── b5c42538c02dc5dfcfaf783388d7922e78a28730
│   ├── refs
│   │   └── main
│   └── snapshots
│       └── 918ad376364058dee23512629bc385380c98e57d
│           ├── added_tokens.json -> ../../blobs/a9d7d93cc226c6364c7e1c58b3a56de9327080cb
│           ├── merges.txt -> ../../blobs/6636bda4a1fd7a63653dffb22683b8162c8de956
│           ├── special_tokens_map.json -> ../../blobs/22fabbdda08346a6dfb95b1782a4efb6f876f2c2
│           ├── tokenizer.json -> ../../blobs/47ffebc226205cbdaf3d3047c0b7f64b67620deb
│           ├── tokenizer_config.json -> ../../blobs/b5c42538c02dc5dfcfaf783388d7922e78a28730
│           └── vocab.json -> ../../blobs/84ef7fb594b5c0979e48bdeddb60a0adef33df0b

This is more sophisticated than just a flat list of files, so we probably can't support this format fully, but we should at least be able to see and manage all the cached models (and maybe model snapshots?), and the space they are taking up.

Oct 10 '22 02:10 thesephist