Dan Yao
Results
2
comments of
Dan Yao
Because qloop's fwd and bwd use different layouts, so we refactor dropout to decouple fwd and bwd, but dropout after refactor brought a lot of overhead to fwd, there are...
I think we should combine backward and backward_dropout.