personalorg comments

Repositories
Issues
Comments

Results 2 comments of


                                            personalorg

TODOs

To overcome your GPU memory constraints, what about just decreasing batch size? On a 1080 Ti (11GB), I'm able to run 128 hidden units, 8 attention heads, 300 glove_dim, 300...

Good suggestion, Min. Since the paper compares against batch norm, have you found that layer norm generally outperforms batch norm lately? One could try batch norm also for comparison. Interestingly...