Mohamad Zamini
Mohamad Zamini
> ``` > @microsoft-github-policy-service agree > ``` @microsoft-github-policy-service agree
> @mzamini92 could you rebase so we can double check no tests are breaking with this and we can merge? Thanks! @muellerzr Thanks for reaching me. I did it based...
If I use 2 H100 I can run the code but I get OOM. When I increase it to +2 GPUs the model duplicates on GPUs instead of sharding and...
same issue +1. OS: Linux GPU count and types: 8 x H100 80GB (if applicable) Hugging Face Transformers/Accelerate/etc. versions: [email protected] & 4.42 Python version: 3.10 deepspeed: 0.15.2 & 0.14.2 &...