wailord

Results 3 comments of wailord

Thanks for your interest! It's not implemented yet, but the distributed logic is similar to [Moonlight](https://arxiv.org/abs/2502.16982).

Hi, Similar to Muon, ROOT applies to tensors with ≥2 dimensions, while 1-D parameters (like bias and norm) are still optimzed by AdamW.

In the toy train script, we only have 2D and 1D parameters, so the current assertion works fine. For models with higher-dimensional parameters (like CNNs), reshaping is necessary. We'll update...