optimum icon indicating copy to clipboard operation
optimum copied to clipboard

changes to _maybe_log_save_evaluate() not reflected in optimum repo

Open prathikr opened this issue 2 years ago • 1 comments

System Info

transformers & optimum installed from source on 2/26/2024

Who can help?

@amyeroberts @JingyaHuang @regisss

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

Can be reproduced by running the current image-classification finetune example under optimum/examples/onnxruntime/training/image-classification/run_image_classification.py with the following run command:

torchrun run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name beans \
--output_dir ./beans_outputs/ \
--remove_unused_columns False \
--label_column_name labels \
--do_train \
--do_eval \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--seed 1337

Expected behavior

Recently there was a change introduced by https://github.com/huggingface/transformers/pull/27326 to log gradient norm in transformer's trainer. These changes are not reflected in optimum repo resulting in the following error:

Traceback (most recent call last):
  File "run_image_classification.py", line 451, in <module>
    main()
  File "run_image_classification.py", line 425, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 392, in train
    return inner_training_loop(
  File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 774, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
TypeError: _maybe_log_save_evaluate() missing 1 required positional argument: 'ignore_keys_for_eval'

WORKAROUND: adjust trainer.py to pass None where group_norm input is expected as that is the default setting.

- self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
+ self._maybe_log_save_evaluate(tr_loss, None, model, trial, epoch, ignore_keys_for_eval)

prathikr avatar Feb 26 '24 22:02 prathikr

It's caused by https://github.com/huggingface/transformers/commit/4f09d0fd888dbf2660313f9715992822acfb99ce Fixed in PR #1730

jingyanwangms avatar Feb 27 '24 18:02 jingyanwangms