Xianjie Qiao

Results 3 comments of Xianjie Qiao

Hi, Is this per-token quantization patch only support single card? I tested this patch on A10 with llama2-7b, there is no problem if I run with single card. But if...

pip install nvidia-cublas-cu12==12.3.4.1