在cogvlm2模型上支持多模态大模型的强化学习训练，例如DPO训练

Open kaka-Cao opened this issue 1 year ago • 1 comments

Describe the feature Please describe the feature requested here(请在这里描述需求) 开发一个适用于cogvlm2等多模态大模型的RLHF训练框架出来，集成PPO,DPO,ORPO,KTO等常见的强化学习算法和多模态强化学习数据集，例如清华发布的RLHF-V数据集和VLFeedback数据集

Paste any useful information Paste any useful information, including papers, github links, etc.(请在这里描述其他有用的信息，比如相关的论文地址，github链接等) 类似这个链接一样，https://github.com/vlf-silkie/VLFeedback/tree/main

Additional context Add any other context or information here(其他信息可以写在这里) 无

Jul 29 '24 08:07 kaka-Cao

参考文档 https://swift.readthedocs.io/zh-cn/latest/Multi-Modal/%E4%BA%BA%E7%B1%BB%E5%81%8F%E5%A5%BD%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.html

Jul 29 '24 14:07 hjh0119