feature: identify model file by SHA
If the API would support model aliases that would allow to plug into existing webui more easily, without supporting directly each one of them. Seems all the UI out there filter by known models - even if the api returns the supported models
maybe we can pass the model via the token? However, won't spare from UI filtering what /models actually returns.
A config file, could be:
- model: my-alpaca
// file: alpaca.bin
checksum:
- sha256:....
- sha1:....
top_p: ..
temperature: ...
url:
- url-to-fetch-model
stop_words:
- <SYSTEM>
alias:
- gpt3
template:
completion: file.tmpl
chat: file.tmpl
- model: my-cerebras
// file: cerebras.bin
checksum:
- sha256:....
- sha1:....
top_p: ..
temperature: ...
stop_words:
- <SYSTEM>
alias:
- gpt4
template:
completion: file.tmpl
chat: file.tmpl
Another example:
models:
- name: ggml-gpt4all-j
checksum:
- sha256: ...
path: /models/ggml-gpt4all-j.bin
backend: gpt4all-j
defaults:
top_p: ...
stop_words: ...
prompts:
- completion: ...
- chat: ...
aliases:
- gpt-3.5-turbo
- gpt4
- name: cerebras-2.7b-q4_0
checksum:
- sha256: ...
path: /models/cerebras-2.7b-q4_0.bin
backend: gpt2
defaults:
top_p: ...
stop_words: ...
prompts:
- completion: ...
- chat: ...
aliases:
- gpt2
You could even point to the same model sha/path, and give it a different name and use different prompts. this allows the maximum of flexibility per users needs.
As talked about on Discord, my proposition is this below.
Takes into consideration
- Separating models from configurations
- use one configuration for many models
- use one model for multiple things (per API path or model name)
- use different models for different things (same as above)
- Considering the fact that we emulate the OpenAI API and have limited info at the server endpoint
- only "model" and API path is known
- Reduce boilerplate from repeating prompts etc for each model file
Folder structure:
│
├─models
│ │
│ └─mymodel.bin
│
└─configs
│
├─generic_completion.yaml
│
└─stableml.yaml
Content of one config yaml:
name: generic_completion
shas:
- 81237981273
- 1283128937
paths:
- v1/chat
- v1/completion
model_names:
- gpt-3.5-turbo
- gpt-4
(backend: gpt4all-j)?
top_p: ...
stop_words:
- <|USER|>
- <|SYSTEM|>
completion_prompt: Complete the following text.
chat_prompt: This is the discussion of an AI and a Human...
chat_system: <|SYSTEM|>
chat_user: <|USER|>
chat_assistant: <|ASSISTANT|>
How it works When a request comes in:
- Model name and API path is known (as per the request + JSON)
- Configs are scanned for combination of path and name
- If the combination is found, the list of sha's is checked
- We look for a fitting model in memory or disk
- if none is found -> back two steps
- We apply the config and run the inference
- Ofc all this should be buffered
User Side
- There could be default config files with known-to-work shas.
- These would auto-configure any known bin files in the models folder
- If a user-created config.yaml file overrides a path/name/sha combination it is used instead.
- The default configs could be updated from the interwebs every now and then
The user can for example..
- Copy a default config.yaml and adapt it for their needs
- Change the prompt or use a different model (add a sha)
- Add a different model name so the configuration can be used via the OpenAI API request
Edit: Alternatively model_name and config.yaml name could be conflated to make it a bit simpler but less flexible (i.e. generic_completion to gpt-4).
Edit2: Actually the backend entry doesn't really fit in there as one might be able to use the config with a different backend..
I suggest that the backend should be determined when the sha's for the existing files are scanned..
Edit3: As the user might find it difficult to get a sha for his model file we might add the option for the sha to be a path?
pivoted this to a specific action item since we have now a "base" configuration mechanism and a wide refactor was needed before jumping into it. Now we can just build up from the current implementation. What I've did is simply explosed the parameters available from the OpenAI endpoint into parameters in the model block. The model block I've kept it very simple for now - just specifying the name, and the model file it will run the association, but no heavy business logic.
:warning::warning::warning::warning::warning:
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
:warning::warning::warning::warning::warning:
To add model aliases, you can create a file in the /models directory called "aliases.json" with the following content: { "bert": { "alias": "whisper", "description": "A model for audio transcription" } } This will allow users to install the "whisper" model by passing its alias instead of the original name. The UI can then use this file to display the correct model name and description in the installation prompt.
Sources:
- https://localai.io/advanced/index.html
- https://github.com/go-skynet/LocalAI/tree/master/main.go
- https://localai.io/models/index.html
this lost of interest and no-one is actually looking at this. Closing