USBhost
USBhost
> I'd try the latest default with the smallest model first to make sure that the quantization works and that the resulting safetensors can be loaded in the web UI....
``` remote: Enumerating objects: 7, done. remote: Counting objects: 100% (7/7), done. remote: Compressing objects: 100% (1/1), done. remote: Total 4 (delta 3), reused 4 (delta 3), pack-reused 0 Unpacking...
``` Traceback (most recent call last): File "/UI/text-generation-webui/server.py", line 234, in shared.model, shared.tokenizer = load_model(shared.model_name) File "/UI/text-generation-webui/modules/models.py", line 101, in load_model model = load_quantized(model_name) File "/UI/text-generation-webui/modules/GPTQ_loader.py", line 69, in load_quantized...
GPTQ 4bit does not load if it was made with [act-order](https://github.com/oobabooga/text-generation-webui/issues/541#issuecomment-1483184836). I am currently testing true-sequential if that's okay. Edit: true-sequential loads so only act-order is broken.
We could follow the naming of the GPTQ example https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/main/README.md?plain=1#L119 ? Having to use a file just for one parameter seems wasteful.
Anyways tomorrow I should have a torrent up with all the new 4bit stuff. here's my over night cooking recipe. ``` #!/bin/bash . ../venv/bin/activate; python ../repositories/GPTQ-for-LLaMa/llama.py llama-65b c4 --new-eval --wbits...
> @USBhost, one last thing: if you create a torrent, make sure to put the tokenizer and config.json files in the respective folders just like in https://huggingface.co/ozcur/alpaca-native-4bit so that we...
> And make sure it says `LlamaTokenizer` rather than `LLaMaTokenizer` otherwise we'll have another big round of support issues 😄 It's straight from the latest convert script so I think...
Just a heads up I am experiencing some interesting things. https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/78