Trung Huynh comments

Repositories
Issues
Comments

Results 3 comments of


                                            Trung Huynh

4bit inference is slow

We have written a small script to convert models trained with QLoRA to CTranslate2 to speed up inference here https://github.com/Actable-AI/llm-utils/blob/main/qlora2ct2/convert_qlora2_ct2.py

4bit inference is slow

for us, it's 4x-6x speed up.

4bit inference is slow

We have successfully converted mt0-xxl on RTX6000