optimi
optimi copied to clipboard
Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation
The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued pretraining)? Is there a way to initialize the kahan summation buffers to the right values based on a high precision checkpoint (or is this already automated somehow)?