noforit
noforit
> Thanks for reporting @BramVanroy, I managed to reproduce and I opened a fix here: #6741 Thanks a lot! I have faced the same problem. Can I use your fix...
Thank you very much for your contribution. I have a question: So the current implementation still uses PyTorch's Mem-eff SDPA instead of flash_attn_varlen_func, right? @RdoubleA
@RdoubleA Thank you very much for your contribution。Another question, I would like to understand whether making correct pos_id has a significant impact, especially with regard to Rotary Position Embedding. After...
@HYLcool 好的好的,谢谢你
@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire...
@tonysy Could you possibly offer a quick example? I'm quite unsure how to ues it. Many thanks for your assistance.
@IcyFeather233 谢谢你😂,我明白这个tensor_parallel_size可以设定为GPU数2,4,8实现模型分片并行。我这里意思是tensor_parallel_size为1,但是GPU 每张卡都加载一整个模型,然后数据并行,同时评测一个任务的不同数据。最近我实现了该种功能,使用NumWorkerPartitioner。以下为关键参数配置:有需要的可以借鉴。 @darrenglow 。同时感谢 @tonysy 。要是能尽快更新到文档就更好了。 
@Zbaoli 我看你的参数和我 差了一个  加个这个试试?
@Zbaoli 奇怪😂。在程序运行前 加上 CUDA_VISIBLE_DEVICES 呢  或者你在/opencompass/opencompass/runners/local.py 里面调试一下?里面会自动检测显卡数量啥的 加个微信?我发你邮件
> 麻烦看下你的文件列表  请问是这个吗