chenglu66

Results 3 comments of chenglu66

感谢帮助,我们已经跑起来了,我发现在论文中提到是constant的学习率在2*10-3,但是我通过查看yaofu/llama-2-7b-80k/blob/main/trainer_state.json,发现是warm-up起来的,感觉如果数据分布比较接近的话,直接constant起就好,为什么还需要warm up?第二点是论文中提到调整 "rope_theta": 由10000.0 调整到500000,但观察上传模型文件配置上并没有更改,rope_theta仍然是10000.0。这两点疑惑希望不吝赐教,帮忙解惑下

@RomneyDa Hello! The issue persisted in the latest version latest version v1.0.24-jetbrains,how to fix it ?

@mamoodi yes, i just run main branch in my docker,my OS Windows 10, when i make run it show waitting,i try many times and wait sever mintues , it does...