Tymoteusz Dolega
Tymoteusz Dolega
How this is a bug?
Any updates on this?
By setting bf16, the model weights are saved in bf16, but gradients (computed in half-precision but converted to full-precision for the update) and optimizations (optimizer states) are still done in...
What are your LoRA parameters? 4 GiB can be normal with high `r` saved as float32
I suggest using for pad token and for eos token.