dyyoungg comments

Results 6 comments of


                                            dyyoungg

计划支持多模态模型，比如llava1.5的long sequence的训练吗

> llava 1.5 有支持 long sequence 训练吗？目前看起来没有

计划支持多模态模型，比如llava1.5的long sequence的训练吗

> 能不能问下你的长序列训练场景是什么呢？我看目前Llava训练的序列长度普遍不长目前很多视频理解模型都是基于llava的，但是理解长度都短，长视频的理解需要更多的图像token

计划支持多模态模型，比如llava1.5的long sequence的训练吗

> 长序列训练不是问题，目前 xtuner 已经支持了。主要问题是需要多模态的长序列数据集我其实困惑就在于多模态数据集处理的时候，是要过vision encoder和projector的，但是如果多图的话，比如几百上千张图，不可能等到你把llm的sequence都拼完了再来切吧，这样效率感觉就低了。就是有vision encoder之后感觉这套训练流程似乎得改

The loss value when the model converges

> > When the model converges to a relatively good situation, how much loss will be trained? > > When the model's **final loss** approximates 25, the audio reconstruction quality...

Pretrain Hubert on english and chinese speech dataset.

> > We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the...

这 ma adaption pretrain代码和论文完全不一样啊

> 可能是因为默认语音数据已经转换为token数据了这样确实是可以的，提前提取audio token，以类似 “audio” 的token占位，但是这样的话必须修改model 的数据prepare，即在forward中根据token id 插入 project后的audio feature，但是code中显然没有这一步，直接是 LLamaForCausalLLM，显然是不合理的