optimi issues

Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation

The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued...

zaptrem

[FeatureRequest]gradient-descent-the-ultimate-optimizer

code:https://github.com/kach/gradient-descent-the-ultimate-optimizer paper:https://arxiv.org/abs/1909.13371 Because this method requires modifying the optimizer so it's not as popular as those common versions, but this repository refactors a lot of optimizers so maybe it can...

sdbds

When using accelerator with Gradient Release hook, Increasing VRAM consumption

2

Here is my implement https://github.com/kohya-ss/sd-scripts/pull/1381 Im not sure if it is correct. https://github.com/kohya-ss/sd-scripts/blob/ed99b2180148258cde955106ce988781eca03006/sdxl_train.py#L502-L510

sdbds

pure fp16 ?

3

Hi, First many thanks for the package very usefull. It works great with pure bf16 + kahan. With fp16 I can't find the set of good settings to get stability....

vince62s

optimi
optimi copied to clipboard

Metadata

Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation

[FeatureRequest]gradient-descent-the-ultimate-optimizer

When using accelerator with Gradient Release hook, Increasing VRAM consumption

pure fp16 ?

← Metadata

Owner

Metadata

optimi optimi copied to clipboard

Metadata

Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation

[FeatureRequest]gradient-descent-the-ultimate-optimizer

When using accelerator with Gradient Release hook, Increasing VRAM consumption

pure fp16 ?

← Metadata

Owner

Metadata

optimi
optimi copied to clipboard