geca icon indicating copy to clipboard operation
geca copied to clipboard

questions upon reproducing GECA on SCAN-dataset

Open Zhaoyi-Li21 opened this issue 3 years ago • 1 comments

Hello, Mr.Andreas, First I'd like to thank u for this work and opening source code of GECA. Here I met up with a problem while trying reproducing GECA on SCAN-dataset. I've got read the shell file (for example : exp/scan_jump/retrieval/run.sh), and let me just copy it :

#!/bin/sh

home="../../.."

for i in seq 0 9 do

python -u $home/compose.py
--dataset scan
--scan_data_dir $home/data/jda/SCAN
--dedup
--wug_size 1
--seed $i
--model_type retrieval
--compute_adjacency
--n_sample 1000
--write "composed.$i.json"
--nouse_trie
--max_comp_len 40
--max_adjacencies 1000
--TEST
> compose.$i.out 2> compose.$i.err

python -u $home/eval.py
--dataset scan
--seed $i
--scan_data_dir $home/data/jda/SCAN
--augment composed.$i.json
--dedup
--aug_ratio 0.3
--n_epochs 150
--n_enc 512
--sched_factor 0.5
--dropout 0.5
--lr 0.001
--notest_curve
--TEST
> eval.$i.out 2> eval.$i.err

done

I have 2 questions: question#1: it's clearly divided into 2 parts : one is to compose augmented data, another is to train and evaluate your seq2seq model on augmented training set. It seems that you were going to sample 1000 augmented data and write them into the composed file in the first part, but the composed file actually contains just around 400 augmented examples( for example, as the scan_jump case), could you please tell me why there is a mismatch :)

question#2: I just use the augmented data already in the composed.$i.json file (cuz it would really take a long time to rerun and recompose data) and try to reproduce your result reported in "Good Enough Compostionally Data Augmentation". I focus on SCAN-Jump_split. I run 6 groups of experiments in total (each of them contains 10 individual experiments), and I got a average result for 83.23% which is slightly lower than 87% reported in the paper. I am wondering if this is out of some improper hyperparams or any other reasons?

Thanks for reading this and looking for your reply!

Zhaoyi-Li21 avatar Mar 04 '22 08:03 Zhaoyi-Li21

BTW It seems that the accuracy will be improved if changing 'aug_ratio' from 0.3 to 0.4 :)

Zhaoyi-Li21 avatar Mar 05 '22 14:03 Zhaoyi-Li21