stephenrawls
Results
1
issues of
stephenrawls
When running ZERO Stage 3 with NVME offload on a 10B parameter model, I am observing roughly 2.3 TFLOPS/GPU (whereas we expect to see closer to 30-40 TFLOPS/GPU for v100...