Jiaxiang Zheng
Results
2
comments of
Jiaxiang Zheng
The FP16/BF16 **1979 TFLOPS** defined in H200 spec is with sparsity, I think the actual MFU should be `420/(1979/2)=42.45%`
I thinks this is beacuse it assumes to load additional metadata stored in the checkpoint. You can refer to the `load_checkpoint` defined in `megatron/training/checkpointing.py` to see how the returned state_dict...