Add DEIMV2 Object Detection Model
Summary
resolves #5015
- [x] Add DinoV3 and VIT tiny as a backbones for detection, primarily for DeimV2 model
- [x] Add DEIMV2 model (OTXModel, Encoder, Decoder), e2e training, export
- [x] Experiment with pre-processing, Copy-blend, EMA, learning rate and its schedule, model weights
- [x] Add Unit tests, perf tests
- [x] Provide final benchmark numbers (vs other DETR variants)
How to test
otx train --config src/otx/recipe/detection/deimv2_l.yaml --data_root tests/assests/car_tree_bug
Checklist
- [x] The PR title and description are clear and descriptive
- [x] I have manually tested the changes
- [x] All changes are covered by automated tests
- [x] All related issues are linked to this PR (if applicable)
- [x] Documentation has been updated (if applicable)
:warning: Please install the to ensure uploads and comments are reliably processed by Codecov.
Codecov Report
:x: Patch coverage is 83.37838% with 246 lines in your changes missing coverage. Please review.
:loudspeaker: Thoughts on this report? Let us know!
Model Manifests to be updated after a decision regarding the DETR models we want to expose
Final benchmark: (averaged across all datasets)
List of the datasets:
Number of runs: 5 (5 different seeds)
| otx_version | task | model | training:epoch_mean | training:epoch_std | training:e2e_time_mean | training:e2e_time_std | training:gpu_mem_mean | training:gpu_mem_std | training:train/iter_time_mean | training:train/iter_time_std | training:val/f1-score_mean | training:val/f1-score_std | torch:test/f1-score_mean | torch:test/f1-score_std | torch:test/iter_time_mean | torch:test/iter_time_std | torch:test/latency_mean | torch:test/latency_std | torch:test/e2e_time_mean | torch:test/e2e_time_std | export:test/f1-score_mean | export:test/f1-score_std | export:test/latency_mean | export:test/latency_std | export:test/e2e_time_mean | export:test/e2e_time_std | task |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2.7.0dev | DETECTION | deim_dfine_l | 71.7073 | 15.0686 | 1477.3654 | 3048.6643 | 8.2641 | 2.7069 | 0.5438 | 0.1741 | 0.6679 | 0.1308 | 0.5572 | 0.2019 | 0.122 | 0.0141 | 0.0753 | 0.0607 | 3.7902 | 3.9503 | 0.559 | 0.1992 | 0.196 | 0.1271 | 10.0849 | 8.485 | DETECTION |
| 2.7.0dev | DETECTION | deim_dfine_m | 82.3902 | 21.246 | 1621.5693 | 3622.4548 | 6.6598 | 2.0274 | 0.4598 | 0.1824 | 0.6702 | 0.1201 | 0.5606 | 0.1843 | 0.0989 | 0.0144 | 0.0675 | 0.0577 | 3.4068 | 3.8209 | 0.5597 | 0.1834 | 0.154 | 0.1013 | 7.8983 | 6.8622 | DETECTION |
| 2.7.0dev | DETECTION | deim_dfine_x | 58.4634 | 13.5538 | 1409.8564 | 2897.2094 | 11.7544 | 2.7089 | 0.6032 | 0.1312 | 0.633 | 0.1562 | 0.5144 | 0.222 | 0.124 | 0.0168 | 0.0956 | 0.0819 | 4.4298 | 3.8482 | 0.5187 | 0.2171 | 0.2634 | 0.1181 | 14.941 | 12.8748 | DETECTION |
| 2.7.0dev | DETECTION | deimv2_l | 46.7805 | 13.1444 | 903.9662 | 1993.7543 | 9.0995 | 3.1126 | 0.4917 | 0.0971 | 0.6861 | 0.0971 | 0.6043 | 0.1395 | 0.102 | 0.0088 | 0.0757 | 0.062 | 3.7672 | 3.8452 | 0.604 | 0.1398 | 0.2569 | 0.1382 | 13.9423 | 11.5627 | DETECTION |
| 2.7.0dev | DETECTION | deimv2_m | 58.9024 | 16.0262 | 1029.4484 | 2173.2948 | 7.9583 | 3.4933 | 0.4768 | 0.1521 | 0.688 | 0.1002 | 0.5868 | 0.1565 | 0.0925 | 0.0099 | 0.067 | 0.0557 | 3.3796 | 3.6302 | 0.5845 | 0.156 | 0.206 | 0.124 | 10.6745 | 8.5066 | DETECTION |
| 2.7.0dev | DETECTION | deimv2_s | 56.1463 | 16.1006 | 966.5823 | 2220.3573 | 6.439 | 3.1682 | 0.4466 | 0.1447 | 0.6525 | 0.1213 | 0.565 | 0.1763 | 0.0854 | 0.011 | 0.0598 | 0.0503 | 3.0799 | 3.5161 | 0.5655 | 0.1756 | 0.1761 | 0.1232 | 8.5511 | 6.4467 | DETECTION |
| 2.7.0dev | DETECTION | deimv2_x | 47.2195 | 14.4283 | 1270.6535 | 2880.1889 | 12.7359 | 3.6657 | 0.6237 | 0.1241 | 0.6931 | 0.0937 | 0.6038 | 0.1436 | 0.1169 | 0.0106 | 0.0899 | 0.076 | 4.2772 | 3.9346 | 0.603 | 0.1434 | 0.3214 | 0.1663 | 17.4233 | 13.9527 | DETECTION |
| 2.7.0dev | DETECTION | dfine_x | 49.7317 | 18.3494 | 912.1143 | 1736.7324 | 10.3559 | 0.7046 | 0.5279 | 0.0389 | 0.6613 | 0.1277 | 0.5685 | 0.1751 | 0.1393 | 0.0092 | 0.0959 | 0.0786 | 4.5827 | 4.128 | 0.5686 | 0.175 | 0.2619 | 0.1257 | 14.6784 | 12.6574 | DETECTION |
DeimV2-S, DeimV2-M, DeimV2-L are recommended DETR models to expose in Geti Tune
Final benchmark: (averaged across all datasets)
Do you know why the large (L) model trains faster than the small (S)?
Final benchmark: (averaged across all datasets)
Do you know why the large (L) model trains faster than the small (S)?
It doesn't train faster, however it converges faster. This chart represents e2e training time including early stopping. On average the L model needs less epochs to achieve high accuracy. Not 100% sure why, probably better match with hyperparameters
DeimV2-S, DeimV2-M, DeimV2-L are recommended DETR models to expose in Geti Tune
I agree with this proposal, the S-M-L model show a good tradeoff of F1 and speed, while the X model is slower than L without improving the accuracy. Please go ahead with creating the manifests :)
