MedSegDiff icon indicating copy to clipboard operation
MedSegDiff copied to clipboard

Help!RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

Open doqizuo opened this issue 1 year ago • 4 comments

creating data loader... creating model and diffusion... training... Traceback (most recent call last): File "scripts/segmentation_train.py", line 118, in main() File "scripts/segmentation_train.py", line 70, in main TrainLoop( File "D:\MedSegDiff-master.\guided_diffusion\train_util.py", line 83, in init self._load_and_sync_parameters() File "D:\MedSegDiff-master.\guided_diffusion\train_util.py", line 139, in _load_and_sync_parameters dist_util.sync_params(self.model.parameters()) File "D:\MedSegDiff-master.\guided_diffusion\dist_util.py", line 111, in sync_params dist.broadcast(p, 0) File "D:\Anaconda\envs\sg\lib\site-packages\torch\distributed\distributed_c10d.py", line 1195, in broadcast work.wait() RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

Plz!Tell me where the problem lies? and why? Hope someone nice can help me~ Thanks!!!

doqizuo avatar Nov 29 '24 14:11 doqizuo

Try add p = p + 0 in the sync_params function within dist_util.py as follows: def sync_params(params): """ Synchronize a sequence of tensors across ranks from rank 0. """ for p in params: with th.no_grad(): p = p + 0 dist.broadcast(p, 0)

Mnk208 avatar Dec 04 '24 01:12 Mnk208

#84

Mnk208 avatar Dec 04 '24 01:12 Mnk208

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


#84

Issues-translate-bot avatar Dec 04 '24 01:12 Issues-translate-bot

Try add p = p + 0 in the sync_params function within dist_util.py as follows: def sync_params(params): """ Synchronize a sequence of tensors across ranks from rank 0. """ for p in params: with th.no_grad(): p = p + 0 dist.broadcast(p, 0)

THANKSSSSSS!!

doqizuo avatar Dec 06 '24 13:12 doqizuo