Yibo Jin
Yibo Jin
遇到类似问题,请问有找到办法吗?
> `Assertion failed: mpiSize == tp * pp ` Did you run with mpi? Hi, how did you solve this error? I met the same problem
> base 模型请使用 default 模板 这里使用的是chat模型,default也试验过,有同样的问题
> > 请问量化后速度变慢的原因有找到么,我这边对比量化前后的速度,同等环境下量化后的速度反而更慢一些了 > > 同样发现了,不知道原因 可能是量化后插入的节点没法调用cuda的kernel导致的吧
Hi, could anyone help?
Hi~ Is that okay to load a LoRA and do quantization works?How should I set for that
Hi, I met the same problem. Have you get rid of this?
> @wanghongtai92: You will have to write custom CUDA kernels which, I assume is a tall order. Hi, I wonder would that be successful for cards with SM_75 like Tesla...
Hi~I have seen the same error recently. Is there any conclusion now? @jamesdborin
It seems that torch.half() is not enough to complete the inference process and a higher precision numerical format needs to be used, like torch.bf16 or sth else