Zhaoyue Cheng
Zhaoyue Cheng
I believe fine tuning can be done on a multi GPU system with accumulating gradients in PyTorch.
I tried to train with the default parameter, but I only get very low F1/ EM after a long time, F1 is around 10 after training for a long time....
yeah, ELMO gives boost for the Bidaf model like 4 points
I don't have permission to merge PRs actually, but since this repository is mostly running with legacy keras, would using `TF_USE_LEGACY_KERAS=1` work as a workaround with older keras version? (also...