Kingking
Kingking
Did the linear layer and the LLM participate in training together in all three stages?
Whether subtitles are also involved in training in the third stage?
> For LLaVA-1.6, it uses both base features (336x336 resolution) and higher resolution features. To perform inference similar to 1.5, you only need to use the base features to avoid...
> > _No description provided._ > > Maybe you can contribute to this part. All you need to do is add a llava_qwen2.py, the corresponding conv_mode, and a preprocess_qwen2 function...
instruct model 使用的template应该是这个对吧
为什么建议直接用base model?instruct model的效果不是会更好吗?
https://github.com/slei109/PATNet/issues/17#issue-1703104975 您好,打扰一下,请问这里处理deepglobe的问题您解决了吗? 我遇到了同样的问题,用作者提供的处理文件最终得到9175张图片而非论文中的5666张