ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: RuntimeError(f"Parameter model.lm.parameters failed at the gradient reduction. " "Some unsupported torch function is operated upon this parameter.")

Open Youly172 opened this issue 2 years ago โ€ข 1 comments

๐Ÿ› Describe the bug

strategy:colossal_gemini print info: chunk.tensors_info[p].state TensorState.HOLD TensorState.HOLD_AFTER_BWD, -->so raise error: RuntimeError(f"Parameter model.lm.parameters failed at the gradient reduction. " "Some unsupported torch function is operated upon this parameter.") but I dont know how to solve it.What's wrong here?

Environment

No response

Youly172 avatar Apr 14 '23 17:04 Youly172

Hi @Youly172 Could you please provide more details to help us to reproduce it? e.g. What's example, command, env, and any changes?

binmakeswell avatar Apr 18 '23 07:04 binmakeswell

I also encountered the same error. Did you manage to resolve it later on?

gaylong9 avatar Jul 17 '23 01:07 gaylong9

้‚ฎไปถๅทฒๆ”ถๅˆฐ~ๆŽๅทง่‰ณ

Youly172 avatar Jul 17 '23 01:07 Youly172

Bot detected the issue body's language is not English, translate it automatically. ๐Ÿ‘ฏ๐Ÿ‘ญ๐Ÿป๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘๐Ÿ‘ซ๐Ÿง‘๐Ÿฟโ€๐Ÿคโ€๐Ÿง‘๐Ÿป๐Ÿ‘ฉ๐Ÿพโ€๐Ÿคโ€๐Ÿ‘จ๐Ÿฟ๐Ÿ‘ฌ๐Ÿฟ


The mail has been received~ Li Qiaoyan

Issues-translate-bot avatar Jul 17 '23 01:07 Issues-translate-bot

I also encountered the same error. Did you manage to resolve it later on?

I'm using version 0.3.0 of colossalai. I encountered a RuntimeError: Parameter "tor_bond_conv.batch_norm.bias" failed at the gradient reduction. Some unsupported torch function is operated upon this parameter. error. In the gemini_plugin.py file, I found a comment mentioning that the support for zero in colossalai is currently not optimal, along with the commented line model = nn.SyncBatchNorm.convert_sync_batchnorm(model, None). I suspected that the issue was caused by the Batch Normalization layers in the model. However, even after uncommenting that line, the error still persists and remains unchanged.

gaylong9 avatar Jul 17 '23 03:07 gaylong9