Lynn
Results
2
comments of
Lynn
Hi, with only MHA, is it possible to realize max_model_len = 128k? In my test, may only 12k.
您好,我重新设置了tp_size=1 目前感觉加载完模型还是卡在这里,最后 nccl timeout掉了  还是会有如下这种报错: RuntimeError: NCCL communicator was aborted on rank 1. Original reason for failure was: [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=727, OpType=BROADCAST, Timeout(ms)=1800000) ran...