hfm
hfm copied to clipboard
Update to also support newer HF cache format
Huggingface Transformers has a new cache format that looks like this:
$ ls ~/.cache/huggingface/hub/
/Users/thesephist/.cache/huggingface/hub/
├── models--EleutherAI--gpt-j-6B
│ ├── blobs
│ │ ├── 22fabbdda08346a6dfb95b1782a4efb6f876f2c2
│ │ ├── 47ffebc226205cbdaf3d3047c0b7f64b67620deb
│ │ ├── 6636bda4a1fd7a63653dffb22683b8162c8de956
│ │ ├── 84ef7fb594b5c0979e48bdeddb60a0adef33df0b
│ │ ├── a9d7d93cc226c6364c7e1c58b3a56de9327080cb
│ │ └── b5c42538c02dc5dfcfaf783388d7922e78a28730
│ ├── refs
│ │ └── main
│ └── snapshots
│ └── 918ad376364058dee23512629bc385380c98e57d
│ ├── added_tokens.json -> ../../blobs/a9d7d93cc226c6364c7e1c58b3a56de9327080cb
│ ├── merges.txt -> ../../blobs/6636bda4a1fd7a63653dffb22683b8162c8de956
│ ├── special_tokens_map.json -> ../../blobs/22fabbdda08346a6dfb95b1782a4efb6f876f2c2
│ ├── tokenizer.json -> ../../blobs/47ffebc226205cbdaf3d3047c0b7f64b67620deb
│ ├── tokenizer_config.json -> ../../blobs/b5c42538c02dc5dfcfaf783388d7922e78a28730
│ └── vocab.json -> ../../blobs/84ef7fb594b5c0979e48bdeddb60a0adef33df0b
This is more sophisticated than just a flat list of files, so we probably can't support this format fully, but we should at least be able to see and manage all the cached models (and maybe model snapshots?), and the space they are taking up.