Nguyen Xuan Bac comments

Results 2 comments of


                                            Nguyen Xuan Bac

Why use class AdamW instead of "torch.optim.adamw "

Hi, At the time I did this project, I did not know the existence of `torch.optim.adamw` Thank you for pointing it out.

关于Internvl3.5的训练

> > > 你得对比TGS（Token per Gpu per Second），packing之后每个iter计算的样本量和token量都是大幅增加的，不能直接比较单个iter时间 > > > > > > 比较的是相同训练数据，训练一个epoch所需的总时间。之前2.5 4B 4h33min，现在需要24h。在八张A800机器上。不同的是现在通过--max_steps设置一个epoch，计算方法是max_steps = (num_samples // batch_size) * num_epochs。请问是我的计算过程有问题吗？ > > 这样的话应该过了不止一个epoch，每个batch_size是1，但是每条序列里包含了不止一个样本，所以不能这样简单的计算，可以参考我们代码里[这个位置的log](https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat_gpt_oss/internvl/model/internvl_chat/modeling_internvl_chat.py#L135)来估算一下每个iter过了多少样本，然后利用这个值去倒推`max_steps`该设置成多少 Hello, What exactly...