error when using Qwen1.5-32B

Open puppet101 opened this issue 1 year ago • 1 comments

Hi, I meet some errors when serving Qwen1.5-32B using fp6 quantization, could you please give me a help? Thank you. My code is below: import mii pipe = mii.pipeline('/mymodel/Qwen1.5-32B-fp16', quantization_mode='wf6af16', tensor_parallel=8) response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128) print(response)

And the error is:

Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... python: /root/miniconda3/envs/pytorch22/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/weight_prepacking.h:151: void weight_matrix_prepacking(int*, size_t, size_t): Assertion `K % 64 == 0' failed.

Apr 29 '24 06:04 puppet101

Any news on this? I'm facing the same error.

Sep 30 '24 13:09 celiolarcher