LoRA Can't reproduce the results for GLUE CoLA

My steps:

git clone https://github.com/microsoft/LoRA.git
cd LoRA
pip install -e .
cd examples/NLU
pip install -e .

Change export num_gpus=8 to export num_gpus=1 in roberta_large_cola.sh

Then CUDA_VISIBLE_DEVICES=0 bash roberta_large_cola.sh

Running on a single A100

Using:

datasets 2.6.1
python 3.9.13
PyTorch 1.13.0+cu117

During training, the eval_matthews_correlation is stuck to 0 at all epochs. I actually had the same issue on the current transformers version, and decreasing the learning rate + no warmup helped to regain OKeyish numbers during training, but not as shiny as 0.68.

Do you have an idea of what I could be doing wrong?

Update: using

export num_gpus=1
export CUBLAS_WORKSPACE_CONFIG=":16:8" # https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
export PYTHONHASHSEED=0
export output_dir="./roberta_cola_custom_sh"
python -m torch.distributed.launch --nproc_per_node=$num_gpus \
examples/text-classification/run_glue.py \
--model_name_or_path roberta-large \
--task_name cola \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 8 \  # original: 4
--learning_rate 2e-5 \  # original: 3e-4
--num_train_epochs 20 \
--output_dir $output_dir/model \
--logging_steps 10 \
--logging_dir $output_dir/log \
--evaluation_strategy epoch \
--save_strategy epoch \
--warmup_ratio 0.0 \  # original: 0.06
--apply_lora \
--lora_r 8 \
--lora_alpha 16 \
--seed 0 \
--weight_decay 0.0  # original: 0.1

trains just fine, I have no eval_matthews_correlation = 0 during training.

Nov 08 '22 21:11 fxmarty

@fxmarty asking as I can´t really get glue as good ad in the paper, if you have run also other glue tasks, did you have to apply similar changes also in remaining tasks?

And also, do I understand correctly that you had to reduce bath size from 4 * 8 = 32 to 8, considering gpus?

Jun 01 '23 09:06 gpucce

I am facing the similar problem: when I set num_gpu=2 and add gradient_accumulation_steps=4 (which makes the batch size still 32), the average of 5 random seeds on CoLA of roberta-large using LoRA is 67.0. This results are "the result for each run is taken from the best epoch".

Aug 14 '23 04:08 haochengxi

Did anyone know the solution? I am assuming the setting per_device_train_batch_size = 4 on a single GPU is equivalent to total batch size = 4 which is the paper setting, but I am still getting matthews_correlation = 0 during evaluation.

Nov 21 '23 16:11 nbasyl