cortex.cpp planning: Supporting vision model (Llava and Llama3.2)

Problem Statement

To support Vision models on Cortex, we need the following:

[ ] 1. Download model .gguf and mmproj file
[ ] 2. v1/models/start takes in model_path (.gguf) and mmproj parameters
[ ] 3. /chat/completions to take in messages content image_url
[ ] 4. image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)
[ ] 5. model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B)
[ ] 6. Pull correct NGL settings from chat model. Ref issue #1763

1. Downloading model .gguf and mmprog file:

For fully compatible with Jan, cortex should be able to pull mmproj file along with GGUF file.

Let's take the image below for example. Screenshot 2024-10-16 at 08 35 06

Scenario steps:

User want to download llava model and expect it to support vision. So, user input:

Direct URL to the GGUF file (e.g. llava-v1.6-mistral-7b.Q3_K_M.gguf), or Url to repository (we will list options filter .gguf file) for user to select.
Since mmproj is also ended with .gguf, we also listed that in the selection.

Cortex will only pull that selected GGUF file, ignoring that:

mmproj.gguf alone won't work.
only traditional gguf file (e.g. llava-v1.6-mistral-7b.Q3_K_M.gguf) will not have vision feature.

So, we need to come up with a way so that cortex knows when to download the mmproj file along with traditional gguf file.

cc @dan-homebrew , @louis-jan , @nguyenhoangthuan99, @vansangpfiev

Feature Idea

Couple of thoughts:

File name based. 1.1. For CLI: Ignore file name contains mmproj when presenting selection list. And download it along with selected traditional gguf file. 1.2. For API: Always scan the directory with same level as the URL provided. If there's a mmproj file name, cortex adds it to the download list.

Edge case: If user provide a direct URL to mmproj file, return error with clear error message.

Thinking / You tell me

Oct 16 '24 01:10 namchuai

Updates:

CLI cortex pull presents .gguf and mmproj files
mmproj param is added to /v1/models/start parameters in #1537

Nov 07 '24 05:11 gabrielle-ong

We should ensure that model.yaml supports this type of abstraction, cc @hahuyhoang411

Nov 07 '24 05:11 dan-menlo

@vansangpfiev and @hahuyhoang411 - can I get your thoughts to add to this list from my naive understanding?

To support Vision models on Cortex, we need the following:

Download model - downloads .gguf and mmproj file -> What is the model pull UX?
v1/models/start takes in model_path (.gguf) and mmproj parameters ✅
/chat/completions to take in messages content image_url ✅
image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)
model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B) ..

Nov 07 '24 06:11 gabrielle-ong

@vansangpfiev and @hahuyhoang411 - can I get your thoughts to add to this list from my naive understanding?

To support Vision models on Cortex, we need the following:

Download model - downloads .gguf and mmproj file -> What is the model pull UX?

v1/models/start takes in model_path (.gguf) and mmproj parameters ✅

/chat/completions to take in messages content image_url ✅

image_url has to be encoded in base64 (via Jan, or link to tool eg https://base64.guru/converter/encode/image)

model support - (side note: Jan currently supports BakLlava 1, llava 7B, Llava 13B) ..

I'm not sure about this yet, since 1 folder can have multiple chat model files with 1 mmproj file.
Yes
I'm not sure if this is a good UX
image_url can be a local path to image, llama-cpp engine support encoding image to base64 and pass it to model.
llama-cpp engine supports BakLlava 1, llava 7B, llava 13B. llama.cpp upstream has already supported MiniCPM-V 2.6, we can integrate it to llama-cpp. llama.cpp upstream does not support llama3.2 vision yet.

We probably need to consider changing the UX for inferencing with vision model, for example:

cortex run llava-7b --image xx.jpg -p "What is in the image?"

Nov 07 '24 06:11 vansangpfiev

Thank you @vansangpfiev and @hahuyhoang411! Quick notes from call:

upstream llama.cpp -> cortex.llama-cpp needs to expose vision parameters to cortex.cpp
Ease of models support: LLava, then MiniCPM.
Llama3.2 vision

Nov 07 '24 08:11 gabrielle-ong

Added an action item, where model management should pull metadata from chat model file instead of projector file (just to make sure we tracked this)

Dec 04 '24 06:12 louis-jan