exo How to load a model from a file

Hello! Thank you for your excellent work! This is the first distributed large model inference framework that I've seen that supports heterogeneity in a P2P way. I was interested and planned to try it on my two Android devices and the 4090, but I ran into a problem. Do I need to place my model files on one node or all nodes in the cluster? If I use a pre-downloaded model, how do I import the model? Looking forward to your reply. Thank you very much

Dec 02 '24 08:12 qtyandhasee

just try to chat or send inference request it starts downloading related model automatically

Dec 02 '24 11:12 UmutAlihan

just try to chat or send inference request it starts downloading related model automatically

Thank you for your answer! I'm just wondering, if my network is not strong enough to download the model in real time (so I choose to download the model locally first), where do I put the model? Do you need to make any changes to the code if you want to do that? Thank you very much for your answer!

Dec 02 '24 13:12 qtyandhasee

Models by default are stored in ~/.cache/huggingface/hub.

You can set a different model storage location by setting the HF_HOME env var. I guess we could put them into ~/.cache/huggingface/hub then.

But which model file is e.g. "model": "llama-3.2-3b" supposed to load?

Which quantization is used and which provider?

Is there a registry which specify this?

EDIT: I see this is the register: exo/models.py

Dec 18 '24 14:12 JoakimCh

I think I also have this problem. Because of China’s bad firewall. We can’t connect to the model server For me it is hard to load the local model into the exo app.

Feb 28 '25 00:02 youyou1130

#747

Dec 18 '25 18:12 Evanev7