Arcmoon

Results 23 comments of Arcmoon

Yeah, I also encountered the same problem. And the bug in detect_anoms.py ![image](https://user-images.githubusercontent.com/50002441/216491207-4ffd3e85-0dba-488d-9f3a-0db5eb3f450f.png) If just use resample(), it return a sampler object, so should continue use `sum()` or `count()`

> > > 实现了这个 [packing-with-FA2](https://huggingface.co/blog/zh/packing-with-FA2),经测试,该方案练吞吐量比 neat_packing 更高 > > > 请问这个flatting packing有验证过收敛性么? > > > > > > 我在相同数据集上相同训练配置尝试了一下neat_packing 和 flatting_packing 发现flatting_packing 初始loss显著高于neat_packing(2.1 vs 0.9) 而且flatting_packing 训练step数高于neat_packing(10454 vs 9850) 训练完的结果也不如neat_packing...

In my case, train Qwen2.5-14B-Instruct, the grad norm quick increase nan

> @Arcmoon-Hu which version of liger-kernel are you on and did you not see the issue without apply kernel? Thanks for quick reply. The version of liger-kernel is 0.3.1 Actually,...

> @Arcmoon-Hu could you provide a minimal reproducible script for the issue? thanks! The question is solved, I just pull the latest code and rebuild it. It's really awesome! I...

https://github.com/deepspeedai/DeepSpeed/issues/6926#issuecomment-2607915663

sorry,也不是稳定复现,运气好能继续续训

> > sorry,也不是稳定复现,运气好能继续续训 > > 求问大佬咋运气好继续的,我试了好多次基本都稳定复现,resume_from_checkpoint以后一定OOM。但明明从一开始lora sft的时候不会报oom。 就纯靠运气,我是试了三次,什么都没改,第三次成功了,玄学哈哈