Arcmoon

Results 23 comments of


                                            Arcmoon

是否可以简单写一个ubuntu环境下能够正确跑的教程呢？感谢

@Liu-yixi 你好，WSL的Unbutu下有运行成功吗

May rstl package have wrong

Yeah, I also encountered the same problem. And the bug in detect_anoms.py ![image](https://user-images.githubusercontent.com/50002441/216491207-4ffd3e85-0dba-488d-9f3a-0db5eb3f450f.png) If just use resample(), it return a sampler object, so should continue use `sum()` or `count()`

Flatting Packing / maybe fix #5443 and #5426

> > > 实现了这个 [packing-with-FA2](https://huggingface.co/blog/zh/packing-with-FA2)，经测试，该方案练吞吐量比 neat_packing 更高 > > > 请问这个flatting packing有验证过收敛性么？ > > > > > > 我在相同数据集上相同训练配置尝试了一下neat_packing 和 flatting_packing 发现flatting_packing 初始loss显著高于neat_packing(2.1 vs 0.9) 而且flatting_packing 训练step数高于neat_packing(10454 vs 9850) 训练完的结果也不如neat_packing...

Loss does not drop when using Liger Kernel at Qwen2.5

In my case, train Qwen2.5-14B-Instruct, the grad norm quick increase nan

Loss does not drop when using Liger Kernel at Qwen2.5

> @Arcmoon-Hu which version of liger-kernel are you on and did you not see the issue without apply kernel? Thanks for quick reply. The version of liger-kernel is 0.3.1 Actually,...

Loss does not drop when using Liger Kernel at Qwen2.5

> @Arcmoon-Hu could you provide a minimal reproducible script for the issue? thanks! The question is solved, I just pull the latest code and rebuild it. It's really awesome! I...

[bug] deepspeed zero++ multinode with liger kernel

same error #640

[bug] deepspeed zero++ multinode with liger kernel

https://github.com/deepspeedai/DeepSpeed/issues/6926#issuecomment-2607915663

从checkpoint续训，导致oom

sorry，也不是稳定复现，运气好能继续续训

从checkpoint续训，导致oom

> > sorry，也不是稳定复现，运气好能继续续训 > > 求问大佬咋运气好继续的，我试了好多次基本都稳定复现，resume_from_checkpoint以后一定OOM。但明明从一开始lora sft的时候不会报oom。就纯靠运气，我是试了三次，什么都没改，第三次成功了，玄学哈哈

1
2
3
›