Xianjie Qiao
Results
3
comments of
Xianjie Qiao
Hi, Is this per-token quantization patch only support single card? I tested this patch on A10 with llama2-7b, there is no problem if I run with single card. But if...
pip install nvidia-cublas-cu12==12.3.4.1