MiniCPM-V [BUG] <title> finetune/dataset.py 有bug

多图微调训练，llm_type=minicpm，报错“data fetch error“。调试后发现，finetune/dataset.py的conversation_to_ids函数有bug。根据llm_type的不同，conversation_to_ids函数会分别调用conversation_to_ids_llama3、conversation_to_ids_qwen2、conversation_to_ids_minicpm。前两个函数返回的input_ids是numpy对象，而最后一个函数直接返回list。然而conversation_to_ids函数的第146行使用.shape的方式获取input_ids的大小，导致错误。

Oct 28 '24 08:10 bingo-todd

分析代码，input_ids 应该是 ids，因为 input_ids 是 list，ids 才是 input_ids 转化后的 numpy 对象，有 .shape 方法

Nov 01 '24 09:11 linglu

还有一处错误，dataset.py 的 215 行：message_ids = tokenizer.encode(message)[1:]，这里截断的第一个元素可能是 image_start_token，导致第 178 行 if len(image_start_tokens) != len(image_end_tokens) 判断 image start token 和 image end tokens 数量时失败，因为 image_start_tokens 刚好被被截断了，长度少了一个

Nov 01 '24 09:11 linglu

same problem, does this problem have a solution?

Nov 05 '24 02:11 wailokkwok

same problem, does this problem have a solution?

I simply add input_ids = np.hstack(input_ids) before return input_ids, and it seems to work.

Nov 07 '24 00:11 bingo-todd

same problem, does this problem have a solution?

I simply add input_ids = np.hstack(input_ids) before return input_ids, and it seems to work.

After this change, do you meet the following problem? RuntimeError: Function torch::autograd::CopySlices returned an invalid gradient at index 1 - got [64, 2304] but expected shape compatible with [192, 2304]

Jan 12 '25 03:01 whoam-challenge

same problem, does this problem have a solution?

I simply add input_ids = np.hstack(input_ids) before return input_ids, and it seems to work.

After this change, do you meet the following problem? RuntimeError: Function torch::autograd::CopySlices returned an invalid gradient at index 1 - got [64, 2304] but expected shape compatible with [192, 2304]

Sorry, I do not remember. Can you post the full log ? I guess this error is related to the image processing.

Jan 16 '25 03:01 bingo-todd