Tymoteusz Dolega

Results 5 comments of Tymoteusz Dolega

By setting bf16, the model weights are saved in bf16, but gradients (computed in half-precision but converted to full-precision for the update) and optimizations (optimizer states) are still done in...

What are your LoRA parameters? 4 GiB can be normal with high `r` saved as float32

I suggest using for pad token and for eos token.