tuned-lens
tuned-lens copied to clipboard
Checkpointing crashes with ZeRO optimizer
Describe the bug
Checkpointing crashes when --zero is set, with the error RuntimeError: Tensors must be CUDA and dense being thrown inside the method consolidate_state_dict()
Expected behavior Shouldn't crash
Screenshots