InternVL
InternVL copied to clipboard
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
### Checklist - [ ] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest...
### Motivation Thanks for your great work of InternVL2.5! For larger model parameters (38B or larger), the finetune scripts are implemented with srun + partition. Is it possible to implement...
https://youtu.be/_EqUR0dYGtE
根据官方文档进行微调后,得到了如下文件  我根据官方的手册进行微调,使用了lora策略,但是没有adapter_config.json,我不知道该如何使用这些微调后的参数。 当我尝试直接加载时,他会告诉我You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
### 📚 The doc issue Does it support swedish language , im working on swedish document extraction usecase @lvhan028 ### Suggest a potential alternative/fix _No response_
问题: 你们现在的项目依赖的包太多了,很多包在requirements里都没有指定版本,导致一键安装时经常出现包版本冲突问题。 两个建议: 1. 指定并定期更新各个依赖包的版本,防止包冲突,或者指定包版本找不到的问题; 2. 打包上传几个经测试可以直接使用的基础docker环境,方便开发使用;
### Checklist - [x] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version....
### Motivation Really nice work. Will you release SFT scripts for InterVL3? ### Related resources _No response_ ### Additional context _No response_
InternVL是很nice的工作。我想请教一下,因为我看InternVL会动态处理输入图片,将其自适应resize成448的整数倍然后分patch。那如果我的标签内有bbox,也要同步到resize后的尺度吗(分patch应该不影响box的尺度?) 然后我看grounding任务说要把box缩放到[0,1000]。因为输入又会动态处理,究竟是把box缩放到[0,1000]就不用考虑动态输入,还是说先将box随图片校准至448整数倍的尺度,然后再resize到[0,1000】
Hello, I have a quick question about the new dataset release MMPR 1.2. The dataset intro says there are 3 million samples (https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2), but in the `meta.json`, it only shows...