Casting Existing FP32/FP16 model weights to BF16 + Kahan Summation

Open zaptrem opened this issue 1 year ago • 0 comments

The instructions say we should just cast the model weights to BF16, but wouldn't that chop a bunch of useful information when resuming from an existing checkpoint (e.g., for continued pretraining)? Is there a way to initialize the kahan summation buffers to the right values based on a high precision checkpoint (or is this already automated somehow)?

Jul 30 '24 01:07 zaptrem