How to load a model from a file
Hello! Thank you for your excellent work! This is the first distributed large model inference framework that I've seen that supports heterogeneity in a P2P way. I was interested and planned to try it on my two Android devices and the 4090, but I ran into a problem. Do I need to place my model files on one node or all nodes in the cluster? If I use a pre-downloaded model, how do I import the model? Looking forward to your reply. Thank you very much
just try to chat or send inference request it starts downloading related model automatically
just try to chat or send inference request it starts downloading related model automatically
Thank you for your answer! I'm just wondering, if my network is not strong enough to download the model in real time (so I choose to download the model locally first), where do I put the model? Do you need to make any changes to the code if you want to do that? Thank you very much for your answer!
Models by default are stored in ~/.cache/huggingface/hub.
You can set a different model storage location by setting the HF_HOME env var. I guess we could put them into
~/.cache/huggingface/hubthen.
But which model file is e.g. "model": "llama-3.2-3b" supposed to load?
Which quantization is used and which provider?
Is there a registry which specify this?
EDIT: I see this is the register: exo/models.py
I think I also have this problem. Because of China’s bad firewall. We can’t connect to the model server For me it is hard to load the local model into the exo app.
#747