Trung Huynh

Results 3 comments of Trung Huynh

We have written a small script to convert models trained with QLoRA to CTranslate2 to speed up inference here https://github.com/Actable-AI/llm-utils/blob/main/qlora2ct2/convert_qlora2_ct2.py

for us, it's 4x-6x speed up.

We have successfully converted mt0-xxl on RTX6000