Jeffrey Quesnelle

Results 11 comments of Jeffrey Quesnelle

Fixed for me as well Ubuntu 20.04 with A100 80GB (suspect it's Python 3.8 that's the culprit)

I'm seeing the same thing... also using 2 A100 GPUs interestingly enough. I'm using torch 2.0.1 (default `pip3 install torch`)

I've also had to go down to a 2x A100 setup because otherwise I run into the NCCL error, nothing larger seems to work

> I also had this problem but it works after making this:`export NCCL_P2P_LEVEL=NVL` > On a 4xA100 80GB,verified two times it solved the issue when present Were you on torch...

It appears as if this may have broken FSDP. For example, as specified in the Alpaca repo, finetuning with `--fsdp "full_sh ard auto_wrap" --fsdp_transformer_layer_cls_to_wrap LlamaDecoderLayer` worked before this commit, but...

I'll merge this in later today -- are there any other changes for 47?

I'm not actively working on the project at the moment (work, school, and life in general happens as you can imagine) but I will approve any good PRs and rebuild...

@microsoft-github-policy-service agree

Yup, no problem! Will have it updated shortly 🙂

Should be all set @slundberg Just a little bit ugly since `LLaMA` and `MPT` load the models themselves it required duplicating the arguments to `Transformers.__init__` in the `__init__` for these...