deepmind-research
deepmind-research copied to clipboard
implementation of stoch depth in nfnets
The implementation of stoch depth in the code of nfnet seems to be batch-wise dropout, but not block-level dropout as described in paper.