Cannot export Donut models to ONNX
System Info
-
transformersversion: 4.24.0.dev0 - Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.13
- Huggingface_hub version: 0.10.0
- PyTorch version (GPU?): 1.12.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
It seems that the default tolerance of 1e-5 in the ONNX configuration for vision-encoder-decoder models is too small for Donut checkpoints (currently seeing 5e-3 - 9e-3 is needed). As a result, many (all?) Donut checkpoints can't be exported using the default values in the CLI.
Having said that, the relatively large discrepancy in the exported models suggests there is a deeper issue involved with tracing these models and it would be great to eliminate this potential source of error before increasing the default value for atol.
Steps to reproduce:
- Pick one of the Donut checkpoints from the
naver-clover-ixorg on the Hub - Export the model using the ONNX CLI, e.g.
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-docvqa --feature=vision2seq-lm onnx/
- The above gives the following error:
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014 3.7758636 ... 3.2241364 2.7353969
-51.43289 ] vs [ -0.6989002 -49.215897 3.7760048 ... 3.223978 2.7355423
-51.433964 ]
Full stack trace
Framework not requested. Using torch to export to ONNX.
Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 4.74k/4.74k [00:00<00:00, 791kB/s]
Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 803M/803M [00:09<00:00, 81.2MB/s]
/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:2895.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 363/363 [00:00<00:00, 85.0kB/s]
Using framework PyTorch: 1.12.1
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if num_channels != self.num_channels:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if width % self.patch_size[1] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if height % self.patch_size[0] != 0:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if min(input_resolution) <= self.window_size:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
windows = windows.view(-1, height // window_size, width // window_size, window_size, window_size, num_channels)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
was_padded = pad_values[3] > 0 or pad_values[5] > 0
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if was_padded:
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
should_pad = (height % 2 == 1) or (width % 2 == 1)
/Users/lewtun/git/hf/transformers/src/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if should_pad:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Validating ONNX model...
-[✓] ONNX model output names match reference model ({'last_hidden_state'})
- Validating ONNX Model output "last_hidden_state":
-[✓] (3, 4800, 1024) matches (3, 4800, 1024)
-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/lewtun/miniconda3/envs/transformers/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 180, in <module>
main()
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/__main__.py", line 107, in main
validate_model_outputs(
File "/Users/lewtun/git/hf/transformers/src/transformers/onnx/convert.py", line 455, in validate_model_outputs
raise ValueError(
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0091094970703125 for [ -0.6990948 -49.217014 3.7758636 ... 3.2241364 2.7353969
-51.43289 ] vs [ -0.6989002 -49.215897 3.7760048 ... 3.223978 2.7355423
-51.433964 ]
Expected behavior
Donut checkpoints can be exported to ONNX using either a good default value for atol or changes to the modeling code enable much better agreement between the original / exported models
cc @mht-sharma would you mind taking a look at this? It might be related to some of the subtleties you noticed with Whisper and passing encoder outputs through the model vs using the getters
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 1e-2
with --atol 1e-2 it works, but value of atol is low.
I think it is better to convert the model separately:
- Encoder
- Decoder
- Decoder with past value.
And pipeline it together.
@BakingBrains I mentioned this here #19401
Update:
The error occurs only in the encoder part of the model i.e Donut. Updated the model inputs to actual inputs from dataset, however the still still persisted.
The issue starts happening from the following modeling_donut_swin.py#L501 layer activation in the DonutSwinLayer. The GeluActivation causes the outputs to diverge between original and onnx models. After removing the activation or using relu the model works till 1e-4 atol.
Update: The error occurs only in the encoder part of the model i.e
Donut. Updated the model inputs to actual inputs from dataset, however the still still persisted.The issue starts happening from the following modeling_donut_swin.py#L501 layer activation in the
DonutSwinLayer. TheGeluActivationcauses the outputs to diverge between original and onnx models. After removing the activation or usingreluthe model works till 1e-4 atol.
The original SwinModel is also using this: https://huggingface.co/microsoft/swin-base-patch4-window7-224-in22k/raw/main/config.json
If you try to convert it, you don't get this issue
Any updates on it @mht-sharma ?
Hi, @lewtun & @mht-sharma any updates?
Hi @WaterKnight1998 , apologies for late response. I was not able to work actively on the issue past few weeks. However, I have seen similar issues with other models and it was mainly because of the sensitivity to the inputs. This model also gave similar behaviour when trying different inputs during validation. However, the error was still around 0.001X.
Since the model architecture of SwinModel and its Donut Encoder is same, it's highly likely that the issue is with the used inputs. But I will validate this once and get back to you in few days.
Hi @WaterKnight1998 , apologies for late response. I was not able to work actively on the issue past few weeks. However, I have seen similar issues with other models and it was mainly because of the sensitivity to the inputs. This model also gave similar behaviour when trying different inputs during validation. However, the error was still around 0.001X.
Since the model architecture of
SwinModeland itsDonutEncoder is same, it's highly likely that the issue is with the used inputs. But I will validate this once and get back to you in few days.
Thank you for the explanation. I am looking forward for your fix :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @WaterKnight1998 @mht-sharma ,
Do you have inference script for Donut document parsing model using encoder and decoder onnx models? Similar to this TrOCR gist
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.