llama-cpp-python
llama-cpp-python copied to clipboard
Loading sharded (GGUF) model files from HF with LLama.from_pretrained() 'additional_files' argument
Added code allows to specify multiple files to load via HuggingFace Hub in LLama.from_pretrained(). New argument takes a List of strings, which are used the same as the 'file_name' string argument. Code could likely be more elegant (ie parallel downloads, but I'm not familiar enough with the HF hub library), but it works. Tested and working on Windows10 and Ubuntu (inside a Docker stack).