transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[tests] fix deepspeed zero3 config for `test_stage3_nvme_offload`

Open faaany opened this issue 1 year ago • 1 comments

What does this PR do?

Since we manually modified the original zero3 config value here, we will end up with a ValueError in accelerate (code).

For Zero3 Checkpointing, we need to turn this value to True.

@amyeroberts and @ydshieh

faaany avatar Jul 10 '24 06:07 faaany

OK for me.

yao-matrix avatar Jul 10 '24 06:07 yao-matrix

Hi @ydshieh , could you take a look at this PR? Thx!

faaany avatar Jul 15 '24 21:07 faaany

Hi @faaany Works for me, but the link

we will end up with a ValueError in accelerate (code).

doesn't seem pointing to the desired line/block?

ydshieh avatar Jul 16 '24 13:07 ydshieh

Hi @faaany Works for me, but the link

we will end up with a ValueError in accelerate (code).

doesn't seem pointing to the desired line/block?

Oh, that's a mis-click. This is the correct line: https://github.com/huggingface/accelerate/blob/main/src/accelerate/accelerator.py#L3278

faaany avatar Jul 16 '24 13:07 faaany

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.