accelerator.prepare just can be run jus once ?
System Info
- `Accelerate` version: 0.28.0
- Platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.29
- Python version: 3.8.10
- Numpy version: 1.21.0
- PyTorch version (GPU?): 2.2.0+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 2015.16 GB
- GPU type: NVIDIA A100-SXM4-80GB
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I write the code with accelerator.prepare more than once:
model, optimizer, train_dataloader,eval_dataloader= accelerator.prepare(
model, optimizer, train_dataloader,eval_dataloader)
lr_scheduler= accelerator.prepare( lr_scheduler)
lr_scheduler.step() running is different with prepare once .
with once, with accelerator.accumulate(model): ,the lr_scheduler.step() will run num_processes times every step ,see code.
with twice prepare and lr_scheduler prepare after.with accelerator.accumulate(model): ,the lr_scheduler.step() will run once every step.
with 2nd prepare with lr_scheduler,Is there some difference with with accelerator.accumulate(model):?
Expected behavior
The reasons for differences in coding type.
Hi @DavideHe, thanks for raising the issue. could you share a minimal reproducer ? The lr_scheduler should behave the same when if the lr_scheduler is in the 2nd prepare. However, we expect the user to only use prepare once. What is the behavior you were expecting ? With accelerator.accumulate(model), the lr_scheduler is should be updated after every gradient_accumulation_steps iteration. See related issue https://github.com/huggingface/accelerate/issues/963
prepare twice
model, optimizer, train_dataloader,eval_dataloader= accelerator.prepare(
model, optimizer, train_dataloader,eval_dataloader)
lr_scheduler= accelerator.prepare( lr_scheduler)
for data in train_dataloader:
with accelerator.accumulate(model):
lr_scheduler.step()
print(lr_scheduler.get_last_lr()[-1])
as the code above, the lr will update every step when gradient_accumulation_steps > 1.
But prepare once , lr will update every gradient_accumulation_steps step.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.