brianchmiel
Results
2
issues of
brianchmiel
Hi, I have some question related to the paper: 1) Which FP8 format (E4M3 / E5M2) do you use for the First Adam moment? Do you use Delayed scaling or...
Hi, In the report you mentioned that you decided don't train in FP8 since you found performance degradation. Can you please explain where you found it? Is it already in...