Guo Yejun
Results
2
comments of
Guo Yejun
> > just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in...
not know why we need to refer to the loss computation code, it is just a simple formula of input/output of a transformer block. For simplicity, let's use k=1 and...