4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b?
Hello, to reduce my brain usage even more I thought i'd be nice to run AI which is specifically trained to code and thus hopefully make better code than other language models which are trained for e.g. natural language.
So I found this: https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b
I of course wanted to try and run it but there's a problem, there aren't even any pytorch_model files or any 4bit variants listed here: https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md
Thank your for your support!
llama.cpp can now load LoRA adapters, you need to convert the LoRA model to ggml using convert-lora-to-ggml.py, then load the original LLaMA 13b as the model and your LoRA model on top of it when launching ./main -m llama-13b.bin --lora lora-model.bin. Something like that.
--lora partially addresses the question, but the https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b also mentions a few embeddings that are needed to support custom tokens they use:
<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
Is there a way to work around this with existing llama.cpp options or would it require a PR?
You're right --lora doesn't support extending the tokenizer yet. In that case the model should be saved in the .pth checkpoint and converted from that. llama.cpp itself can load the tokens from the model file.
Can anyone convert this to load this model? I'm particularly interested in this topic of using this models to write and work with code
I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd
It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.
However, I'm not sure that the extra tokens are being used for tokenization.
EDIT:
It seems to work even with the text versions of <|prompter|> <|assistant|>...
So by converting the files with the ggml python script we can use gpt4all-alpaca-oa-codealpaca-Lora-13b but not as one file. But your script @SlyEcho can do that?
Edit: For the llama I have only got the consolidated.00.pth and consolidated.01.pth
The script should download 13b from huggingface.co/decapoda-research/llama-13b-hf automatically.
I also tried the --lora adapter and it technically works, but the tokens don't work and it is slower.
Thank you, your script worked and I now have the .bin shards of the 13b model merged, the thing to do now it to get it to f16 and to 4bit, but which convert.py script do you use? There are different ones.
convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0
Can anyone upoad the bin file of this model for using it on llama.cpp?
I could but I there is no point because it doesn't work well.
Tks anyway :)
Where can i find this? I can only find the conversion scripts for gpt4all etc.?
Thanks> convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0
This one: convert.py
edit: if you are seeing gpt4all conversion scripts, then you may need to do a git pull
Thank you, don't know how I didn't see that.
I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd
It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.
However, I'm not sure that the extra tokens are being used for tokenization.
EDIT:
It seems to work even with the text versions of
<|prompter|><|assistant|>...
I have gotten a vocab size mismatch, how can I fix that?
You need to use the vocab files from the jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b repo. convert.py can also read vocab files from another directory, so you can point it to whereever HF downloader wrote the files on your disk, or just download them.
But there are some weird things going on, there are embeddings for 16 new tokens in there, but the JSON only specifies 5. My script also cuts it down to 5 but you may want to hack on this because I don't understand how it's supposed to work.
I have forgotten to put the added_tokens.json in the directory. Thanks it worked now!
If you change main.cpp around line 173 to this it should use the tokens for -ins mode
// prefix & suffix for instruct mode
const auto inp_pfx = std::vector<llama_token> { 32002 }; // <|prompter|>
const auto inp_sfx = std::vector<llama_token> { 32004 }; // <|assistant|>
edit: I think the </s> or EOS token is not needed, after all. without it it works better.
The output I get is also a bit weird, it doesn't want to write code. It wanted me to visit a GitHub repo which doesn't exist.
I can recommend other good models that are not LoRA:
- chavinlo/alpaca-native 7b model
- chavinlo/alpaca-13b
- chavinlo/gpt4-x-alpaca 13b, new, I haven't tested much
These can be converted directly with convert.py and used with the instruct mode since they use the same Alpaca prompts.
I believe this has been answered!