MiniCPM-V [BUG] lack of position_id in MiniCPMVProcessor

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

The return of MiniCPMVProcessor lacks position_id.

期望行为 | Expected Behavior

Sucessfully run using the code below in the "Steps to reproduce"

复现方法 | Steps To Reproduce

I want to use MiniCPMVProcessor to get a processed input and feed it into MiniCPMV. The pseudocode is as follows:

model = AutoModel.from_pretrained('/path/to/minicpmv_ckpt')
processor = AutoProcessor.from_pretrained('/path/to/minicpmv_ckpt')

msgs = [{'role': 'user', 'content': 'Who are you?'}]
msgs = processor.tokenizer.apply_chat_template(msgs)
inputs = processor(msgs, add_generate_prompt=True)

output_hidden_states = model(**inputs, output_hidden_states=True)

There would be many errors including KeyError of tgt_sizes and position_ids.

运行环境 | Environment

- OS: Debian testing
- Python: 3.11
- Transformers: 4.44 (latest)
- PyTorch: 2.3
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`): 12.4

备注 | Anything else?

No response

Aug 16 '24 09:08 geekifan

对Slice Encoding的Positional Embedding Interpolation 这部分代码是在哪里实现的

Aug 24 '24 09:08 PangziZhang523

Hello, did you solve this?

Oct 28 '24 08:10 zhaowenZhou

@zhaowenZhou No, I didn't solve it. I use model.generate as a workaround.

Dec 09 '24 08:12 geekifan