Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Why should use weights_only=False when load checkpoint

Open Suparjie opened this issue 11 months ago • 1 comments

Your question Ask a clear and concise question about Megatron-LM.

In the below commit, I find that weights_only is False when torch load checkpoint, so WHY? https://github.com/NVIDIA/Megatron-LM/commit/eee2bc9a74ba9cba70d8fbe0e7384d1ea243f904_

Suparjie avatar May 13 '25 02:05 Suparjie

I thinks this is beacuse it assumes to load additional metadata stored in the checkpoint. You can refer to the load_checkpoint defined in megatron/training/checkpointing.py to see how the returned state_dict be used.

JiaxiangZheng avatar May 20 '25 14:05 JiaxiangZheng

Torch 2.6 + does not support weights_only = True

wplf avatar Jul 03 '25 10:07 wplf

In Torch 2.6, set weights_only to True by default to avoid RCE. However, are there any security issues if we set weights_only to False explicitly here? For example, a malicious checkpoint file could contain code that allows arbitrary execution.

cnspary avatar Jul 10 '25 04:07 cnspary

@cnspary you can allow list specific classes if needed while keeping weights_only=True. More info here: https://github.com/NVIDIA/Megatron-LM/blob/main/docs/source/api-guide/dist_checkpointing.rst#safe-checkpoint-loading

sbhavani avatar Nov 13 '25 05:11 sbhavani