transformers Cannot export Donut models to ONNX

System Info

transformers version: 4.24.0.dev0
Platform: macOS-10.16-x86_64-i386-64bit
Python version: 3.8.13
Huggingface_hub version: 0.10.0
PyTorch version (GPU?): 1.12.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

It seems that the default tolerance of 1e-5 in the ONNX configuration for vision-encoder-decoder models is too small for Donut checkpoints (currently seeing 5e-3 - 9e-3 is needed). As a result, many (all?) Donut checkpoints can't be exported using the default values in the CLI.

Having said that, the relatively large discrepancy in the exported models suggests there is a deeper issue involved with tracing these models and it would be great to eliminate this potential source of error before increasing the default value for atol.

Steps to reproduce:

Pick one of the Donut checkpoints from the naver-clover-ix org on the Hub
Export the model using the ONNX CLI, e.g.

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-docvqa --feature=vision2seq-lm onnx/

The above gives the following error:

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014    3.7758636 ...   3.2241364   2.7353969
 -51.43289  ] vs [ -0.6989002 -49.215897    3.7760048 ...   3.223978    2.7355423
 -51.433964 ]

Full stack trace

Framework not requested. Using torch to export to ONNX.
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.74k/4.74k [00:00<00:00, 791kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 803M/803M [00:09<00:00, 81.2MB/s]
/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 85.0kB/s]
Using framework PyTorch: 1.12.1
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if width % self.patch_size[1] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height % self.patch_size[0] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min(input_resolution) <= self.window_size:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  windows = windows.view(-1, height // window_size, width // window_size, window_size, window_size, num_channels)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if was_padded:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  should_pad = (height % 2 == 1) or (width % 2 == 1)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if should_pad:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Validating ONNX model...
        -[✓] ONNX model output names match reference model ({'last_hidden_state'})
        - Validating ONNX Model output "last_hidden_state":
                -[✓] (3, 4800, 1024) matches (3, 4800, 1024)
                -[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
  File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 107, in main
    validate_model_outputs(
  File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/convert.py", line 455, in validate_model_outputs
    raise ValueError(
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014    3.7758636 ...   3.2241364   2.7353969
 -51.43289  ] vs [ -0.6989002 -49.215897    3.7760048 ...   3.223978    2.7355423
 -51.433964 ]

Expected behavior

Donut checkpoints can be exported to ONNX using either a good default value for atol or changes to the modeling code enable much better agreement between the original / exported models

Oct 31 '22 14:10 lewtun

cc @mht-sharma would you mind taking a look at this? It might be related to some of the subtleties you noticed with Whisper and passing encoder outputs through the model vs using the getters

Oct 31 '22 14:10 lewtun

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 1e-2

with --atol 1e-2 it works, but value of atol is low.

I think it is better to convert the model separately:

Encoder
Decoder
Decoder with past value.

And pipeline it together.

Oct 31 '22 14:10 BakingBrains

@BakingBrains I mentioned this here #19401

Nov 07 '22 16:11 WaterKnight1998

Update: The error occurs only in the encoder part of the model i.e Donut. Updated the model inputs to actual inputs from dataset, however the still still persisted.

The issue starts happening from the following modeling_donut_swin.py#L501 layer activation in the DonutSwinLayer. The GeluActivation causes the outputs to diverge between original and onnx models. After removing the activation or using relu the model works till 1e-4 atol.

Nov 09 '22 04:11 mht-sharma

Update: The error occurs only in the encoder part of the model i.e Donut. Updated the model inputs to actual inputs from dataset, however the still still persisted.

The issue starts happening from the following modeling_donut_swin.py#L501 layer activation in the DonutSwinLayer. The GeluActivation causes the outputs to diverge between original and onnx models. After removing the activation or using relu the model works till 1e-4 atol.

The original SwinModel is also using this: https://huggingface.co/microsoft/swin-base-patch4-window7-224-in22k/raw/main/config.json

If you try to convert it, you don't get this issue

Nov 09 '22 09:11 WaterKnight1998

Any updates on it @mht-sharma ?

Nov 15 '22 11:11 WaterKnight1998

Hi, @lewtun & @mht-sharma any updates?

Dec 07 '22 11:12 WaterKnight1998

Hi @WaterKnight1998 , apologies for late response. I was not able to work actively on the issue past few weeks. However, I have seen similar issues with other models and it was mainly because of the sensitivity to the inputs. This model also gave similar behaviour when trying different inputs during validation. However, the error was still around 0.001X.

Since the model architecture of SwinModel and its Donut Encoder is same, it's highly likely that the issue is with the used inputs. But I will validate this once and get back to you in few days.

Dec 07 '22 12:12 mht-sharma

Hi @WaterKnight1998 , apologies for late response. I was not able to work actively on the issue past few weeks. However, I have seen similar issues with other models and it was mainly because of the sensitivity to the inputs. This model also gave similar behaviour when trying different inputs during validation. However, the error was still around 0.001X.

Since the model architecture of SwinModel and its Donut Encoder is same, it's highly likely that the issue is with the used inputs. But I will validate this once and get back to you in few days.

Thank you for the explanation. I am looking forward for your fix :)

Dec 07 '22 18:12 WaterKnight1998

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 01 '23 15:01 github-actions[bot]

Hi @WaterKnight1998 @mht-sharma ,

Do you have inference script for Donut document parsing model using encoder and decoder onnx models? Similar to this TrOCR gist

Jan 03 '23 12:01 satkatai

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 27 '23 15:01 github-actions[bot]