Less Wright

Results 97 comments of Less Wright

Sep issue for DeepGEMM here: https://github.com/pytorch/torchtitan/issues/889

#2 = PR https://github.com/pytorch/torchtitan/pull/1127

Well the gemms are very performant but these are inference only. They didn't release the backward portion ala wgrad. From their issues discussion it seems they are considering releasing but...

working on triton implementation to support both inference and training. Bf16 version forward in testing now.

Progress update: We have landed a forward MG * NG group Gemm for deepseek inference this week (bf16)...you can run it using generate.py. This also has backward kernels but needs...

Thanks @vwxyzjn for the update! We have a cleaner version of deepseek now, so we can potentially integrate there, or just jump to mxfp8 directly.

hi @ajWithNucleus, I'm no longer working on Titan but maybe @tianyu-l or @danielvegamyhre can provide an update if any plans to integrate these training kernels.