tuned-lens icon indicating copy to clipboard operation
tuned-lens copied to clipboard

Checkpointing crashes with ZeRO optimizer

Open norabelrose opened this issue 2 years ago • 0 comments

Describe the bug Checkpointing crashes when --zero is set, with the error RuntimeError: Tensors must be CUDA and dense being thrown inside the method consolidate_state_dict()

Expected behavior Shouldn't crash

Screenshots Captura de pantalla 2023-05-14 a la(s) 12 01 03 p m

norabelrose avatar May 14 '23 19:05 norabelrose