Koratahiu comments

Results 23 comments of


                                            Koratahiu

HuyuanVideo Transformer Override / GGUF support

> Thanks! I don't have a HV transformer file, so I'm going to merge without testing. Please double-check the PR with this in mind. I tested it on embedding training...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

Why layer filter, though? If it’s because of 1D params, we already reshape them to 2D effectively by the SMMF method (when `1D Vector Reshape` = True).

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

I looked at their code ``` # optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4, betas=(0.90, 0.95), weight_decay=0.01) # To replace the above, do the following: from muon import MuonWithAuxAdam hidden_weights = [p for...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

I did something very similar in the K-B PR, but it ended up changing a lot of files and was eventually reverted. It should be easy to implement again, though,...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

Added MuonWithAuxAdam optimizer to TODO list

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

> would be great if you could add the original Muon as well not only Muon_Adv, so we have a comparison to the original Yeah, I meant as an option...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

`MuonWithAuxAdam` is now available as an option for `Muon_adv`, if anyone wants to test the proposed muon as suggested by its author. It uses `ADAMW_ADV` (special UI for it inside...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

- [x] In [PyTorch 2.9, Muon](https://docs.pytorch.org/docs/stable/generated/torch.optim.Muon.html#torch.optim.Muon) uses RMS scaling, which scales its updates to match Adam’s learning rate range. In this PR, we already have this feature, but it’s only...

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

> Given Dxq found regressions in features that shouldnt have regress in 2.9.0 and that all the testing has been done on 2.8, there is almost no reason to upgrade....

Muon and AdaMuon: Orthogonal Optimizers for AdvOptim

> See comments Reverted both changes of UIState and BaseConfig, and included the fix inside Muon logic path