Zhanfeng Mo comments

Results 4 comments of


                                            Zhanfeng Mo

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

I got the similar results as you, using the recommended hyperparameter in https://github.com/jiaweizzhao/GaLore/issues/25#issuecomment-2003124947. ```{python} python run_glue.py \ --model_name_or_path roberta-base \ --task_name mrpc \ --enable_galore \ --lora_all_modules \ --max_length 512 \...

(Question) About glue tasks

I have the same issue. I have checked the gradient norm and the learning rate are not zero. In the original code, once the metric is initialized, it was not...

(Question) About glue tasks

> Hello, thanks for your inspiring and excellent work! > > I want to try full fine-tuning to compare with Galora, and I have blocked the use of Galora. However,...

[Question]: Reproduce LLMLingua-2 results with Mistral-7B

> ### Describe the issue > First of all, thank you for your great contributions. > > I have a similar question to the [issue 146](https://github.com/microsoft/LLMLingua/issues/146), I cannot reproduce the...