Yinghan Huang
Yinghan Huang
> > How do you set the ckpt_path?  , if you set the ckpt of InternVideo2_Stage2 to `vision_ckpt_path`, it shouldn't meet size mismatch of `text_proj.weight`. > > Thanks, I...
> 我用ds_z0_config可以训练,ds_z3_config和ds_z2_config都会卡住。 感谢大佬!我试下
> 可以试试单节点, 我是qwen3-vl:30b的. 3张卡会一直卡住, 单卡试了下可以. 我试了下单卡96G的情况下就CUDA OOM了,数据集小一点确实可以跑
> 我用ds_z0_config可以训练,ds_z3_config和ds_z2_config都会卡住。 神奇。。我这确实一样的现象,ds_z0_config可以训练,是因为目前对于qwen3omni 只支持数据并行吗
> 看起来是chattemplate.jinja的问题 你和原来的模型的模版对一下diff看看 确实,原版没有这个文件而是chat_template.json, 而且微调后(左边)少了很多
> 看起来是chattemplate.jinja的问题 你和原来的模型的模版对一下diff看看 大佬,我用原版的chat_template替换之后,遇到了另一个比较奇怪的报错,我在微调的时候和原版一样保持了bf16=true,但是推理时在运行到 inputs = inputs.to(model.device).to(model.dtype) 这一步会报错: Traceback (most recent call last): File "/cpfs01/users/yinghan.huang/Software/anaconda3/envs/qwen3omni-dense/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/cpfs01/users/yinghan.huang/Software/anaconda3/envs/qwen3omni-dense/lib/python3.10/runpy.py", line 86, in _run_code exec(code,...
> 试试更新一下torch版本?2.7.1看看有没有这个问题 解决了,确实需要torch升到2.7.0,感谢大佬~
I try to set find_unused_parameter as true but meet another issue
Besides, in readme it's said that CLIP teacher is necessary in pretrain, but how to set the readme to set this? I can not solve this problem using the provided...
I finally got pretrain running with this config. But if I just want to finetune with VTC and VTM error for retrieval task, what should I do?