Trung Huynh
Results
3
comments of
Trung Huynh
We have written a small script to convert models trained with QLoRA to CTranslate2 to speed up inference here https://github.com/Actable-AI/llm-utils/blob/main/qlora2ct2/convert_qlora2_ct2.py
for us, it's 4x-6x speed up.
We have successfully converted mt0-xxl on RTX6000