Transformer model openbmb/MiniCPM-Llama3-V-2_5 not supported
The bug
Loading and prompting the transformer model openbmb/MiniCPM-Llama3-V-2_5 does not work.
It tries to load the model (but according to nvtop nothing is allocated on my gpu). No error is thrown. Trying to prompt the LLM stops immediately without a response and without an error.
To Reproduce
from guidance import models
lm = models.Transformers('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
print(lm + "Hello?")
Worth to mention, that openbmb provided a test script for transformers, that does work
# test.py
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True, torch_dtype=torch.float16)
model = model.to(device='cuda')
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-Llama3-V-2_5', trust_remote_code=True)
model.eval()
image = Image.open('xx.jpg').convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]
res = model.chat(
image=image,
msgs=msgs,
tokenizer=tokenizer,
sampling=True, # if sampling=False, beam_search will be used by default
temperature=0.7,
# system_prompt='' # pass system_prompt if needed
)
print(res)
@nking-1 , have you come across this in your forays into multimodal models?
I actually do get an error on my machine during a forward pass:
TypeError: MiniCPMV.forward() missing 1 required positional argument: 'data' (can include full traceback if helpful)
It seems that this model departs from the standard huggingface model-call API that we're using (likely because of multimodality).
Checking in, what is the status on this one? MiniCPM 2.6 has been released, I will try out if that works now. Otherwise, is there anything I can assist the Gudiance Dev Team with to help resolve this issue? With the rise of "inner monologue" models like o1 it is clear that guidance will play a significant role in the near future in the LLM community and resolving this kind of issue might be a huge leap towards supporting a broader audience