LLaMA-Pro icon indicating copy to clipboard operation
LLaMA-Pro copied to clipboard

[ACL 2024] Progressive LLaMA with Block Expansion.

Results 24 LLaMA-Pro issues
Sort by recently updated
recently updated
newest added

After finetuned the llama-3-8B-instruct with the same configuration ,as the code from:https://github.com/hiyouga/LLaMA-Factory/tree/3df986c6793a51ec2cb5f31fd1808cd3a9883bc4/examples/extrasexamples/extras/llama_pro always leads to apparent loss of original ability? I only used the train datasets "Identity". Can you help?...

Thanks for your effort. I have a little confusion about the process. Correct me if I'm wrong. First, we should run block_expansion.py to create our extended model. Then, we clone...

Thanks for your great work. I wonder if it is possible to directly use alpaca_lora or stanford_alpaca to finetune 8B model in arbitary dataset. Can we access the code? Or...

Hi, I would like to know when the pretrain code will be released?

想请教下,llama-pro训练的显存需求是多少,和lora比要多多少

Hi there! It's really an interesting work, but I have following questions: 1. I think the proposed block expansion is quite similar to the idea of Adapter Tuning, can you...

如果想把llama-pro应用到更大的模型中比如34B、72B,那么是否需要按比例增大block的数量?这方面的实验是否有做过呢?

`expand.sh` for llama_pro not working by default without fire lib proof: ``` (llama_factory) administrator@srv01:/home/jupyter/LLaMA-Factory/examples/extras/llama_pro$ bash expand.sh Traceback (most recent call last): File "/home/jupyter/LLaMA-Factory/examples/extras/llama_pro/../../../scripts/llama_pro.py", line 11, in import fire ModuleNotFoundError: No...

作者您好,毫无疑问,你们在llm的增量学习方面上做了一项优秀的工作。以下的llama均为llama2 但我对llama-pro有一点疑问。llama-pro是在llama的基础上添加了8层 identity block,同时进行了通用语料的全参数训练,所以我可以认为此时的8层 identity block已经退化为与其余32层类似的block,llama-pro也就成为了8.3B的普通的llama。 在后面的代码和数学预训练中,只不过是冻结了llama-8.3b的32层block,只微调8层block(当然这8个block的位置还是添加时候的位置) 所以,作者应该设置一个实验,将llama7b中的block冻结24层,微调8层(顺序上是每冻结3层block而开放一层block),这样就能证明是否有必要添加8层新的块进行微调。 Hello Author, undoubtedly, you have done excellent work in the incremental learning of LLM. Below, all "llama" refer to "llama2". But I...

您好,有几点请教下; 1.预训练使用lora,也是只训练lora新增加的参数。那和lora对比优势是什么呢? 2.这种方式预训练时,避免遗忘,增加领域数据时,还需要增加适当的通用数据混合吗? 3.sft阶段,是使用的全参训练吧,那sft阶段还是避免不了遗忘呢