llama.cpp Can this code base be extended to support other transformer-based LLMs such as Pythia or its instruction-tuned version Open Assistant?

Mar 17 '23 01:03 michaelbogdan

If you provided links and more information about Pythia it would make it easier for someone to know what Pythia is and if the model can be converted into the format that llama.cpp supports.

See #172 for Stanford's Alpaca.

Mar 17 '23 08:03 gjmulder

Sorry for the little information, I wanted to get initial feedback on whether there is interest in extending the code base to support more LLMs, sinde the project's name is that of one LLM it can run. I put the required information below, should I edit the initial post for better visibility?

Pythia is a suite of fully open source LLMs bei EleutherAI developed for interpretability research and trained on completely open data. The Hugging Face page is quite explanatory I thik:

The Pythia Scaling Suite is a collection of models developed to facilitate interpretability research. It contains two sets of eight models of sizes 70M, 160M, 410M, 1B, 1.4B, 2.8B, 6.9B, and 12B. For each size, there are two models: one trained on the Pile, and one trained on the Pile after the dataset has been globally deduplicated. All 8 model sizes are trained on the exact same data, in the exact same order. All Pythia models are available on Hugging Face.

The Pythia model suite was deliberately designed to promote scientific research on large language models, especially interpretability research. Despite not centering downstream performance as a design goal, we find the models match or exceed the performance of similar and same-sized models, such as those in the OPT and GPT-Neo suites.

Open Assistant (GitHub, Website) is a project by LAION to create a fully open instruction tuned LLM, similar to Alpaca or GPT-3.5. The project consists of multiple parts and phases, one of which is to gather data by volunteers on how the assistant should reply to queries. An initial portion of this data has been used to fine-tune Pythia-12B to follow instructions and the research preview has been published on HuggingFace as oasst-sft-1-pythia-12b.

The model is published at 8bit precision and considering the great success of llama.cpp at decreasing the model sizes by quantization to 4 or possibly even 3 bit, I think there is a great synergy between the projects as the massively lower hardware requirements could make Open Assistant even more accessible in practical terms.

I quote the model card of the research preview here:

This is the first iteration English supervised-fine-tuning (SFT) model of the Open-Assistant project. It is based on a Pythia 12B that was fine-tuned on ~22k human demonstrations of assistant conversations collected through the https://open-assistant.io/ human feedback web app before March 7, 2023.

Mar 17 '23 15:03 michaelbogdan

@gjmulder I'm going to hijack this issue to expand my question by all available FOSS LLMs. (WIP)

BLOOM is massive LLM at 176B parameters and was developed by a collaboration under Hugging Face leadership. There is a version of the model that has only 7.1B parameters and can feasibly run on a laptop using the related project bloomz.cpp. Should this project be extended to include BLOOM, merged with the bloomz.cpp effort or should they be kept as separate projects? I am not currently aware of any efforts to instruction tune this LLM.

GPT-NeoX is a 20B parameter model trained on The Pile, the largest publically available dataset for LLM training. There is an effort by OpenChatKit to create an instruction-tuned version of GPT-NeoX as an open source alternative to ChatGPT. The model itself is released on Hugging Face as GPT-NeoXT-Chat-Base-20B.

GPT-Neo (tbc)

GPT-J (tbc)

GPT-2 (tbc)

Mar 18 '23 09:03 michaelbogdan

Are we basically making an open source ts_server (https://bellard.org/ts_server/) now? If so, I also nominate RWKV (https://github.com/BlinkDL/RWKV-LM)

Mar 18 '23 21:03 Ronsor

@Ronsor I am not sure, that is why I am asking and doing some preliminary research. The maintainer(s) can of course tell that running other models than LLaMA is out of scope.

Mar 19 '23 02:03 michaelbogdan

Hi, @michaelbogdan , we (at nolano.org) are working on something akin to Huggingface (Python based interface) for running LLMs, but with C/CPP backend for speed. For tomorrow's initial release, we plan to have GPT-J and BLOOM. GPT-NeoX (and hence Pythia & OpenAssistant) is next - I will also get these out by tomorrow.

Mar 19 '23 03:03 Ayushk4

Update: Its here - https://github.com/NolanoOrg/cformers I am working on getting Pythia/GPT-NeoX models out next - OpenAssistant/oasst-sft-1-pythia-12b is much better than Alpaca and will be added soon.

Mar 20 '23 05:03 Ayushk4

OpenAssistant and Open-Chat-Kit models have been added to nolanoorg/cformers.

You can now interface with the models with just 3 lines of code from python.

from interface import AutoInference as AI
ai = AI('OpenAssistant/oasst-sft-1-pythia-12b')
x = ai.generate("<|prompter|>What's the Earth total population<|endoftext|><|assistant|>", num_tokens_to_generate=100); print(x['token_str'])

So far 10 different models are supported across 5 different architectures.

Mar 25 '23 17:03 Ayushk4