llama.cpp 4bit version of gpt4all-alpaca-oa-codealpaca-Lora-13b?

Hello, to reduce my brain usage even more I thought i'd be nice to run AI which is specifically trained to code and thus hopefully make better code than other language models which are trained for e.g. natural language.

So I found this: https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b

I of course wanted to try and run it but there's a problem, there aren't even any pytorch_model files or any 4bit variants listed here: https://github.com/underlines/awesome-marketing-datascience/blob/master/awesome-ai.md

Thank your for your support!

Apr 18 '23 04:04 sussyboiiii

llama.cpp can now load LoRA adapters, you need to convert the LoRA model to ggml using convert-lora-to-ggml.py, then load the original LLaMA 13b as the model and your LoRA model on top of it when launching ./main -m llama-13b.bin --lora lora-model.bin. Something like that.

Apr 18 '23 11:04 SlyEcho

--lora partially addresses the question, but the https://huggingface.co/jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b also mentions a few embeddings that are needed to support custom tokens they use:

<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>

Is there a way to work around this with existing llama.cpp options or would it require a PR?

Apr 18 '23 12:04 execveat

You're right --lora doesn't support extending the tokenizer yet. In that case the model should be saved in the .pth checkpoint and converted from that. llama.cpp itself can load the tokens from the model file.

Apr 18 '23 12:04 SlyEcho

Can anyone convert this to load this model? I'm particularly interested in this topic of using this models to write and work with code

Apr 18 '23 12:04 NoNamedCat

I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd

It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.

However, I'm not sure that the extra tokens are being used for tokenization.

EDIT:

It seems to work even with the text versions of <|prompter|> <|assistant|>...

Apr 18 '23 14:04 SlyEcho

So by converting the files with the ggml python script we can use gpt4all-alpaca-oa-codealpaca-Lora-13b but not as one file. But your script @SlyEcho can do that?

Edit: For the llama I have only got the consolidated.00.pth and consolidated.01.pth

Apr 18 '23 16:04 sussyboiiii

The script should download 13b from huggingface.co/decapoda-research/llama-13b-hf automatically.

I also tried the --lora adapter and it technically works, but the tokens don't work and it is slower.

Apr 18 '23 19:04 SlyEcho

Thank you, your script worked and I now have the .bin shards of the 13b model merged, the thing to do now it to get it to f16 and to 4bit, but which convert.py script do you use? There are different ones.

Apr 19 '23 04:04 sussyboiiii

convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0

Apr 19 '23 07:04 SlyEcho

Can anyone upoad the bin file of this model for using it on llama.cpp?

Apr 19 '23 07:04 NoNamedCat

I could but I there is no point because it doesn't work well.

Apr 19 '23 07:04 SlyEcho

Tks anyway :)

Apr 19 '23 07:04 NoNamedCat

Where can i find this? I can only find the conversion scripts for gpt4all etc.? Thanks> convert.py from the master branch of this repo can handle HF format models now. You can specify the output format, but for some reason it didn't let me use q4_0, so I used f16 and then I ran ./quantize on it to get it down to q4_0

Apr 19 '23 07:04 sussyboiiii

This one: convert.py

edit: if you are seeing gpt4all conversion scripts, then you may need to do a git pull

Apr 19 '23 07:04 SlyEcho

Thank you, don't know how I didn't see that.

Apr 19 '23 07:04 sussyboiiii

I created this script to merge the models: https://gist.github.com/SlyEcho/477554916bfc1a9e338240eee6396fbd

It creates a HF checkpoint that can be converted using convert.py to ggml f16 format and then later to q4_0 with quantize.

However, I'm not sure that the extra tokens are being used for tokenization.

EDIT:

It seems to work even with the text versions of <|prompter|> <|assistant|>...

I have gotten a vocab size mismatch, how can I fix that?

Apr 19 '23 10:04 sussyboiiii

You need to use the vocab files from the jordiclive/gpt4all-alpaca-oa-codealpaca-lora-13b repo. convert.py can also read vocab files from another directory, so you can point it to whereever HF downloader wrote the files on your disk, or just download them.

But there are some weird things going on, there are embeddings for 16 new tokens in there, but the JSON only specifies 5. My script also cuts it down to 5 but you may want to hack on this because I don't understand how it's supposed to work.

Apr 19 '23 10:04 SlyEcho

I have forgotten to put the added_tokens.json in the directory. Thanks it worked now!

Apr 19 '23 11:04 sussyboiiii

If you change main.cpp around line 173 to this it should use the tokens for -ins mode

    // prefix & suffix for instruct mode
    const auto inp_pfx = std::vector<llama_token> { 32002 }; // <|prompter|>
    const auto inp_sfx = std::vector<llama_token> { 32004 }; // <|assistant|>

edit: I think the </s> or EOS token is not needed, after all. without it it works better.

Apr 19 '23 11:04 SlyEcho

The output I get is also a bit weird, it doesn't want to write code. It wanted me to visit a GitHub repo which doesn't exist.

Apr 19 '23 11:04 sussyboiiii

I can recommend other good models that are not LoRA:

chavinlo/alpaca-native 7b model
chavinlo/alpaca-13b
chavinlo/gpt4-x-alpaca 13b, new, I haven't tested much

These can be converted directly with convert.py and used with the instruct mode since they use the same Alpaca prompts.

Apr 19 '23 11:04 SlyEcho

I believe this has been answered!

May 08 '23 09:05 sussyboiiii