Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] Why enable `non_blocking=True` when doing synchronous D2H?
The comment on line 76 of filesystem_async.py indicates that Megatron performs synchronous Device-to-Host (D2H) transfers for checkpointing. However, on line 94, the code enables non_blocking=True during these transfers (code link). Unfortunately, I did not find any explicit CUDA Stream or Event synchronization primitives in the subsequent steps of the checkpointing process. Could this omission potentially introduce security risks, such as saving incomplete CPU tensors to the disk?
FYI:
Marking as stale. No activity in 60 days.