doctr [question] Text detection evaluation is so long with the rotation flag.

Bug description

I am testing the db_resnet50 on the given doctr/references/detection/evaluate_pytorch.py script. When I don't use "--rotation", the eval script is relatively fast (40s for FUNSD), but when I use that flag, it is so long (more than 30m, I am still waiting). I understand that it should be slower when we use '--rotation' because the script need to handle with polygons but the actual time is longer than I can expect. Can someone double-check it?

Code snippet to reproduce the bug

python evaluate_pytorch.py db_resnet50 --rotation --amp -b 32

Error traceback

Input without --rotation:

Namespace(arch='db_resnet50', dataset='FUNSD', batch_size=32, device=None, size=None, workers=None, rotation=False, resume=None, amp=True)
Unpacking FUNSD: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 149/149 [00:00<00:00, 2850.94it/s]
Unpacking FUNSD: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 2669.90it/s]
Test set loaded in 0.7717s (199 samples in 7 batches)
Running evaluation
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:40<00:00,  5.82s/it]
Validation loss: 0.732066 (Recall: 83.55% | Precision: 86.67% | Mean IoU: 67.83%)

Input without --rotation: Still running xD

Environment

DocTR version: 0.8.0a0 TensorFlow version: N/A PyTorch version: 2.1.0a0+4136153 (torchvision 0.16.0a0) OpenCV version: 4.9.0 OS: Ubuntu 22.04.2 LTS Python version: 3.10.6 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 12.1.105 GPU models and configuration: GPU 0: NVIDIA A30 Nvidia driver version: 525.147.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2

Deep Learning backend

is_tf_available: False is_torch_available: True

Feb 15 '24 17:02 decadance-dance

Probably it is a duplicate but I am wondering if there is a solution? I am using broadcasting but it is still slow.

Feb 15 '24 17:02 decadance-dance

Hi @decadance-dance 👋, You are right we need to find a better/faster solution for the calculation, that's a known issue. :) (Maybe a refactoring to shapely could help) Unfortunately, we are currently only 2 people working on docTR in our spare time, so it's difficult to cover everything at once 😅

Feb 15 '24 19:02 felixdittrich92

@felixdittrich92 got it. Maybe I'll take a look at this issue later to optimize it.

Feb 16 '24 09:02 decadance-dance

@felixdittrich92 got it. Maybe I'll take a look at this issue later to optimize it.

Sounds good we are happy about every contribution 👍🏼

Feb 16 '24 10:02 felixT2K