MiniCPM-V [BUG] llamacpp : Random output when offloading to GPU

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

The new MiniCPMV-2.6 model runs correctly on the fork of llama.cpp, using the GGUF-Files downloaded from Huggingface, if the computation is done on the CPU. If I pass additionally "-ngl 50" to offload the computation to GPU, it only spits out repeating nonsense.

期望行为 | Expected Behavior

Same behaviour between CPU (-ngl 0) and GPU (-ngl 50)

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:Linux
- Python: C++
- Transformers: llama.cpp
- PyTorch: llama.cpp
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 11.8

备注 | Anything else?

No response

Aug 09 '24 15:08 auriocus

same issue here

Aug 10 '24 10:08 luixiao0

Could you please tell me which branch you are using?

Aug 14 '24 10:08 tc-mb

same question

Aug 15 '24 07:08 zhuchenxi

Could you please tell me which branch you are using?

It's minicpmv-main. The bug also happens when using text-only inference with llama-cli, but not with a different model. Here are some logs:

gollwi01@a24cuda:~/Programmieren/minicpmv-llamacpp> git status
On branch minicpmv-main
Your branch is up to date with 'origin/minicpmv-main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        "Knut_der_Eisb\303\244r_Januar_2011.jpg"

nothing added to commit but untracked files present (use "git add" to track)
gollwi01@a24cuda:~/Programmieren/minicpmv-llamacpp> md5sum ggml-model-f16.gguf mmproj-model-f16.gguf 
047524a4cac91584e954931192246629  ggml-model-f16.gguf
90ea2303505e376df5c7ee16f316b663  mmproj-model-f16.gguf
gollwi01@a24cuda:~/Programmieren/minicpmv-llamacpp> ./llama-minicpmv-cli -m ggml-model-f16.gguf --mmproj mmproj-model-f16.gguf --temp 0.1 --no-mmap -p "Describe the image" --image Knut_der_Eisbär_Januar_2011.jpg
...
minicpmv_init: llama process image in  4517.89 ms.
<user>Describe the image
<assistant>
The image captures a polar bear in a zoo or sanctuary setting, resting on a rock. The bear is white with some patches of dirt or mud on its fur, indicating it might have been playing or rolling around. The bear's eyes are closed, and it appears to be in a relaxed or possibly sleepy state. The background consists of large, rugged rocks, suggesting a naturalistic enclosure designed to mimic the bear's natural habitat. The lighting in the image is natural, suggesting it was taken during the day.

gollwi01@a24cuda:~/Programmieren/minicpmv-llamacpp> ./llama-minicpmv-cli -m ggml-model-f16.gguf --mmproj mmproj-model-f16.gguf --temp 0.1 --no-mmap -p "Describe the image" --image Knut_der_Eisbär_Januar_2011.jpg -ngl 50
...
minicpmv_init: llama process image in   219.81 ms.
<user>Describe the image
<assistant>
GiantGPTGPTGGPTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG^C

The full output is attached. log.txt

The command to compile was:

PKG_CONFIG_PATH=/home/gollwi01/bin/lib/pkgconfig/ LLAMA_CUDA=1 CC=gcc-10 CXX=g++-10 make -j 8

In the current version of llama.cpp that should be "GGML_CUDA=1"

Aug 15 '24 09:08 auriocus

Update: It seems to work with the minicpmv-main-dev branch. I haven't checked quantized versions, though.

Aug 15 '24 12:08 auriocus