optimi
optimi copied to clipboard
Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers
The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued...
code:https://github.com/kach/gradient-descent-the-ultimate-optimizer paper:https://arxiv.org/abs/1909.13371 Because this method requires modifying the optimizer so it's not as popular as those common versions, but this repository refactors a lot of optimizers so maybe it can...
Here is my implement https://github.com/kohya-ss/sd-scripts/pull/1381 Im not sure if it is correct. https://github.com/kohya-ss/sd-scripts/blob/ed99b2180148258cde955106ce988781eca03006/sdxl_train.py#L502-L510
Hi, First many thanks for the package very usefull. It works great with pure bf16 + kahan. With fp16 I can't find the set of good settings to get stability....