Guo Yejun

Results 2 comments of Guo Yejun

> > just have another idea, shall we change "recompute_fwd" from bool to float among [0.0, 1.0] (and also rename it as recompute_fwd_factor), since not all gemms are recomputed in...

not know why we need to refer to the loss computation code, it is just a simple formula of input/output of a transformer block. For simplicity, let's use k=1 and...