leon-cas

Results 2 comments of leon-cas

> @coddinglxf I just solved that problem with `nn.NLLLoss(ignore_index=0)` which 0 is equal to pad_index. Even if we target the 0(unmasked_value), it doesn't affect to the loss of propagation why...

> 需要 4.49.0 @hiyouga 目前用的transformer就是4.49.0, 在相同的配置下,训练qwen2.5-vl相比qwen2-vl要慢5-6x; 不知该问题有解没?