personalorg
Results
2
comments of
personalorg
To overcome your GPU memory constraints, what about just decreasing batch size? On a 1080 Ti (11GB), I'm able to run 128 hidden units, 8 attention heads, 300 glove_dim, 300...
Good suggestion, Min. Since the paper compares against batch norm, have you found that layer norm generally outperforms batch norm lately? One could try batch norm also for comparison. Interestingly...