Hugo Pitorro

Results 6 comments of Hugo Pitorro

> > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > Yeah, I got the same...

> > > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > > > >...

> > > > > > > I tried using this branch but got an error about not getting expected number of gradients during backward (15 vs 16) > >...

I found that disabling amp did not seem to help for my use case, plus lowering LR just diminishes metric performance with this specific use case dataset. Does anyone have...

Thank you for the quick reply. My reasoning for the parallel code was so that the decay would start from the first non-pad token instead of an arbitrary `decay**idx`. I'll...

Hi @zigzagcai, thank you for the very helpful code! I've been playing around with it but struggling to get generation to work properly. Namely, I'm packing sequences to` (1, sum(seq_len),...