Jiacheng Ye
Jiacheng Ye
I got the same error, is it because the deepspeed version?
Same issue for me. I'm using 4\*A100 80G on openwebtext, I change the batch 12 -> 24 and gradient_accumulation_steps = 5\*8 -> 5\*4.
I use fp16, disable the flash attention works for me
Btw, this is the system info: ``` NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" ```
Hi, it's weird as the settings in run_epr.sh is the same as that in the paper. Could you check whether you can obtain similar results to the paper for other...
I've figured out solusions about above questions. With the default parameters in codebase, I got 26.15 with BM25. However, the EPR performs even worse (22.9) after training the BERT-based retriever....
Hi, Here is the full list of commends: ``` #!/bin/bash #SBATCH --job-name=epr_mtop-null_v4 #SBATCH --output=outputs/epr_mtop-null_v4/out.txt #SBATCH --error=outputs/epr_mtop-null_v4/out.txt #SBATCH --partition=NLP #SBATCH --time=12000 #SBATCH --quotatype=reserved #SBATCH --gres=gpu:2 srun python find_bm25.py output_path=$PWD/data/bm25_mtop-null_a_train.json \ dataset_split=train...
I got 49.17 after training 120 epochs on mtop, it's still weird... 😂
Hi Ohad, do you have any updates? :)
Hi, sorry for the late reply. It seems to be a data loader issue. You can check whether the qnli dataset is correctly downloaded from huggingface.