Lynn

Results 2 comments of Lynn

Hi, with only MHA, is it possible to realize max_model_len = 128k? In my test, may only 12k.

您好,我重新设置了tp_size=1 目前感觉加载完模型还是卡在这里,最后 nccl timeout掉了 ![image](https://github.com/OpenLMLab/LOMO/assets/37737346/a66c7ee3-44c2-48c7-b765-8bbfdd882a37) 还是会有如下这种报错: RuntimeError: NCCL communicator was aborted on rank 1. Original reason for failure was: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=727, OpType=BROADCAST, Timeout(ms)=1800000) ran...