LoRA Fintuning 176B Bloom with lora

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

Dec 27 '22 05:12 drxmy

Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer?

Dec 27 '22 11:12 edwardjhu

I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.

Jan 03 '23 07:01 drxmy

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot!

Mar 07 '23 09:03 aegisgpt

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

Mar 20 '23 04:03 zsc

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs?

Mar 21 '23 08:03 aegisgpt

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

Mar 21 '23 09:03 zsc

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

Thank you! That helps!

Mar 21 '23 11:03 aegisgpt