MiniCPM-V Question about full-parameter finetuning

Thx for your great work! I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!

10000 step: (corresponding to red line, ignore blue line)

~5 epoch: 企业微信截图_e7e709a1-4fab-4034-a4b1-3591945db1dc

Jun 18 '24 08:06 dydxdt

请问你使用了多少资源进行全参数微调，我使用2张v100和4张v100均不行

Jun 20 '24 03:06 1SingleFeng

我在全参微调的时候，控制台不打印loss信息，这个有遇到过么

Jun 29 '24 09:06 todaydeath

请问你使用了多少资源进行全参数微调，我使用2张v100和4张v100均不行

全量微调的话可能要8张v100

Jul 04 '24 09:07 LDLINGLINGLING

Thx for your great work!感谢您的出色工作！ I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?我有一个关于训练参数的问题，即参数 max_steps=10000 是否适合全参数微调？

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!我使用自己的训练数据集进行全参数微调，数据集大约有 240,000 个数据。使用默认训练设置训练模型后，我看到训练日志显示：“epoch：0.32”，这意味着它使用了训练数据的 1/3 数据。我的训练数据集包含 3 个不同的任务（caption、ocr,...）。然后我使用 num_train_epochs（=5，与 qwen 相同）而不是 max_steps 进行训练，但我发现在我的标题测试集上测试时，具有 5 个 epochs 的模型比具有 10000 步的模型性能更差。损失似乎很正常。那么，您能为这种情况提供一些建议吗？感谢！

10000 step: (corresponding to red line, ignore blue line)10000步：（对应红线，忽略蓝线）

~5 epoch: ~5 纪元：

The picture you gave only shows the train loss, which cannot reflect the real effect, so it is meaningless to only look at the loss.

Jul 04 '24 09:07 LDLINGLINGLING