ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Conflict with Deepspeed and colossalAI for fused adam

Open Fazziekey opened this issue 2 years ago • 2 comments

🐛 Describe the bug

when the anaconda env has both deepspeed and colossalAI, and we use zero optimizer for deepspeed, I will get error to load Fused Adam, because both of the deepspeed and colossalAI will put their kernel in .conda/envs/chat/lib/python3.9/site-packages/op_builder and get conflict image

related issue in deepspeed: https://github.com/microsoft/DeepSpeed/issues/2874

Environment

No response

Fazziekey avatar Feb 22 '23 07:02 Fazziekey

How did you install colossalai? There is no op_builder/ folder in my site-packages/ folder. image

ver217 avatar Feb 22 '23 07:02 ver217

@FrankLeeeee and @ver217 I just finished debugging an issue on the DeepSpeed side that seems very related to this issue.

I can confirm that op_builder is being installed in site-packages. I was able to reproduce this by cloning the latest ColossalAI repo and installing via pip install .. One way you can see this is when you try and uninstall colossalai:

image

Please also see this issue: https://github.com/microsoft/DeepSpeed/issues/2904 and my related PR: https://github.com/microsoft/DeepSpeed/pull/2963 where I am trying to workaround this issue.

I should note that we've had similar issues in the past where we accidently installed packages into site-packages (https://github.com/microsoft/DeepSpeed/issues/2690). They can usually be resolved by adding an explicit exclude for folders like op_builder in your setup.py here:

https://github.com/hpcaitech/ColossalAI/blob/ea0b52c12ee7598fe126ed0d8b0557f7e8a0e999/setup.py#L154-L164

jeffra avatar Mar 07 '23 23:03 jeffra