Jasper Lu

Results 2 comments of Jasper Lu

Hm a part of this may be because torchtune uses fsdp2, and I was using fsdp1 through transformers. Is there any native way to use fsdp2 through transformers trainer today?...

I looked into this a little further and figured it out. The two main differences between torchtune and accelerate + transformers trainer that makes Qwen 32B trainable on torchtune are:...