changes to _maybe_log_save_evaluate() not reflected in optimum repo
System Info
transformers & optimum installed from source on 2/26/2024
Who can help?
@amyeroberts @JingyaHuang @regisss
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
Can be reproduced by running the current image-classification finetune example under optimum/examples/onnxruntime/training/image-classification/run_image_classification.py with the following run command:
torchrun run_image_classification.py \
--model_name_or_path google/vit-base-patch16-224-in21k \
--dataset_name beans \
--output_dir ./beans_outputs/ \
--remove_unused_columns False \
--label_column_name labels \
--do_train \
--do_eval \
--learning_rate 2e-5 \
--num_train_epochs 10 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--logging_strategy steps \
--logging_steps 10 \
--evaluation_strategy epoch \
--seed 1337
Expected behavior
Recently there was a change introduced by https://github.com/huggingface/transformers/pull/27326 to log gradient norm in transformer's trainer. These changes are not reflected in optimum repo resulting in the following error:
Traceback (most recent call last):
File "run_image_classification.py", line 451, in <module>
main()
File "run_image_classification.py", line 425, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 392, in train
return inner_training_loop(
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/optimum/onnxruntime/trainer.py", line 774, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
TypeError: _maybe_log_save_evaluate() missing 1 required positional argument: 'ignore_keys_for_eval'
WORKAROUND: adjust trainer.py to pass None where group_norm input is expected as that is the default setting.
- self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
+ self._maybe_log_save_evaluate(tr_loss, None, model, trial, epoch, ignore_keys_for_eval)
It's caused by https://github.com/huggingface/transformers/commit/4f09d0fd888dbf2660313f9715992822acfb99ce Fixed in PR #1730