brianchmiel
brianchmiel
@JonasGeiping Dropbox doesn't allow downloading such big files. Is it possible to upload the dataset "data=c4-subset-processed" to some google-drive or another place that allows us to download it? Thanks! 
I encountered the same ppt problem on PTB. Any update how to resolve it?
Thank you for your answer. So, why is the reason you define the first moment as uint8 datatype : https://github.com/Azure/MS-AMP/blob/0a2cd721fa68ee725e3b2fb132df02ddb8069d62/msamp/__init__.py#L81C9-L81C23
Thank you for your answer. I am asking about the training- why do decided to train in bf16 and not fp8 (as deppseek)? Do you found any instability there? In...