Deep-Learning-Accelerator-SW Lower perfomance when trying to replicate the DLA Dense Performance results

I have a Jetson AGX Orin 64GB and I’m testing it concretely on the Orin Dense Performance section of the page. I have downloaded your models and used your commands lines provided on the README.md in /scripts/prepare_models/.

Logs on verbose mode are here:

log_retinanet_resnext50_MAXN.txt log_retinanet_resnet34_MAXN.txt log_ssd_resnet34_MAXN.txt log_resnet50_MAXN.txt log_ssd_mobilenetv1_MAXN.txt

RetinaNet ResNeXt-50: yours is 78 fps, mine is 39 fps
RetinaNet ResNet-34: yours is 108 fps, mine is 53 fps
SSD-ResNet-34: yours is 83 fps, mine is 41 fps
ResNet-50: yours is 2037 fps, mine is 504 qps * 2 (batch) = 1008 fps.
SSD-MobileNetV1: yours is 2664 fps, mine is 655 qps * 2 (batch) = 1310 fps.

My results seems to be constantly around half of your reported results. I copy and paste your command lines for execution so I don’t think I’m missing an option here. I double checked that I was on MAXN power mode. I do not understand what I’m missing.

Thanks in advance!

Dec 04 '24 12:12 anguloqd

I had the same issue and then noticed it says "2x DLA images per second on a Jetson AGX Orin 64GB" above the table. It is not very clear, but I guess that means they are multiplying by two since there are two DLA cores on the AGX.

Jan 30 '25 03:01 jquinn57

Yes, because tasks on two DLA cores can run in parallel, the performance we show should be equivalent to having two DLA cores working concurrently.

Sep 19 '25 03:09 lynettez