SemanticFinder Live/auto-fetching model info

I noticed the # downloads and # likes in the model dropdown is hard coded. I dug around into the Huggingface Hub API and found that we can access this info via GET requests. Here's an example of doing it for gte-mini:

const url = "https://huggingface.co/api/models/TaylorAI/gte-tiny";
const headers = {
    "user-agent": "unknown/None;",
    "Accept-Encoding": "gzip, deflate",
    "Accept": "*/*",
    "Connection": "keep-alive",
};

fetch(url, { method: 'GET', headers: headers })
    .then(response => response.json())  
    .then(data => console.log(data))
    .catch(error => console.error('Error:', error));

You can enter the following code in the JS console to verify it works. The downloads and likes entries directly get us those values, but the model size is a bit harder, especially because we are hardcoding the size of the Onyx model. I'm not sure if we can even get that value.

Additionally, I want to ask @do-me what it means when a model has many different download sizes in the dropdown, e.g. snowflake-arctic-embed-xs.

Sep 02 '24 19:09 varunneal

Exactly, I was trying to find a more elegant processing pipeline to get the sizes but apparently there is no API for that.

The problem with the different sizes is the myriad of different quantization variants. It's not easy to tell (just based on the naming) what onnx model corresponds to what name. Some repos have only one, others many different versions, e.g.:

https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5/tree/main/onnx
https://huggingface.co/Snowflake/snowflake-arctic-embed-xs/tree/main/onnx.

I documented all of my findings in here: https://github.com/do-me/trending-huggingface-models and the jupyter notebook. The repo also creates a ready-to-copy-and-paste html section for SemanticFinder.

As a side note, it also includes these experimental models like https://huggingface.co/onnx-community/decision-transformer-gym-halfcheetah-expert/tree/main/onnx that do not work for semantic similarity.

For ease of use I decided to include all file sizes to give the user at least some kind of idea how heavy the model is but that's certainly not the best way.

I was also considering saying goodbye to hard-coded models and using a free text field like here instead. But this way, it becomes harder for non-expert users...

Maybe something in between would be good:

a free text input field that can be used for any model
a dropdown for our "chef's selection" of good models. When selecting a model here, it would simply copy the value to the text input

We just need to keep an eye on the index file loading logic so that nothing breaks. Some people already contributed files and are (supposedly) actively using this logic.

What's your take on this?

Sep 04 '24 09:09 do-me

Just checking, are the files that Jhnbsomersxkhi2 contributed live anywhere? I think I'll definitely start trying to contribute to that HF repo

Sep 04 '24 18:09 varunneal

For the GET API we can get the size of the default model in terms of number of fp32 and fp16 parameters, and from here we could compute the model, but as you said, I'm not sure we can get the size of the quantized model. That said, we could statically encode the size of the model (maybe both the default/quantized) and then dynamically fetch # likes and # downloads.

Sep 04 '24 18:09 varunneal

Just checking, are the files that Jhnbsomersxkhi2 contributed live anywhere? I think I'll definitely start trying to contribute to that HF repo

Sure, you can find all files either in the readme catalogue or directly in the files section. HF is pretty much like GitHub.

I would love some kind of functionality where users simply click a button "Publish to Huggingface" to open a PR with the index and correctly formatted metadata, similar to when you share your results here.

Sep 05 '24 06:09 do-me

For the GET API we can get the size of the default model in terms of number of fp32 and fp16 parameters, and from here we could compute the model, but as you said, I'm not sure we can get the size of the quantized model. That said, we could statically encode the size of the model (maybe both the default/quantized) and then dynamically fetch # likes and # downloads.

So wait, we can get the regular size of the model via GET request right? If I remember correctly the quantized model's sizes then follow a pretty linear scheme, like e.g. fp16 is always ~50% of the regular model's size, q4 is always ~25% and so on. That seems like the easiest option to me to avoid hard-coding and having a future-proof method maybe.

The only issue might be that if we calculate all file sizes for all quantization methods on the fly for a model that e.g. does not have an fp16 version it might confuse users.

Sep 05 '24 07:09 do-me

Yes that's a great idea! What're the storage limits for Hugging Face? And yes we can get the regular size of the model. Check out the above js query and look under safetensors.parameters

Sep 05 '24 18:09 varunneal

Fyi, I'm imagining something like this for selecting a model: https://jsfiddle.net/vtkrqxgh/

Sep 10 '24 09:09 do-me