Zhanfeng Mo
Zhanfeng Mo
I got the similar results as you, using the recommended hyperparameter in https://github.com/jiaweizzhao/GaLore/issues/25#issuecomment-2003124947. ```{python} python run_glue.py \ --model_name_or_path roberta-base \ --task_name mrpc \ --enable_galore \ --lora_all_modules \ --max_length 512 \...
I have the same issue. I have checked the gradient norm and the learning rate are not zero. In the original code, once the metric is initialized, it was not...
> Hello, thanks for your inspiring and excellent work! > > I want to try full fine-tuning to compare with Galora, and I have blocked the use of Galora. However,...
> ### Describe the issue > First of all, thank you for your great contributions. > > I have a similar question to the [issue 146](https://github.com/microsoft/LLMLingua/issues/146), I cannot reproduce the...