FBLGit
FBLGit
My PRs are always open for anyone who wants to help, I keep the branches updated at my end.. and to be honest .. the scope of this one has...
the high grad_norm is because there is a big mismatch of whatever the model originally fitted and what is being presented now. This could be a wrong pad/eos/bos.. it could...
I run bf16 with no issue, deepspeed & accelerate & axo ``` bf16: true fp16: bfloat16: true ``` and ` --mixed_precision bf16` as accelerate launch argument. but here i have...
hmmm.. the frontend package is not being pushed 👎 need some fix, also semver alignment should be followed, to keep consistency. @anshul7665 can u give a quick hand on this...
Trying to address this issue on #117
This should be re-considered, the concerns of plaguing the codebase with CUDA dependants is true.. we should address the design constraints to make this happen and not close the door...