Deep-Learning-Accelerator-SW icon indicating copy to clipboard operation
Deep-Learning-Accelerator-SW copied to clipboard

Lower perfomance when trying to replicate the DLA Dense Performance results

Open anguloqd opened this issue 1 year ago • 2 comments

I have a Jetson AGX Orin 64GB and I’m testing it concretely on the Orin Dense Performance section of the page. I have downloaded your models and used your commands lines provided on the README.md in /scripts/prepare_models/.

Logs on verbose mode are here:

log_retinanet_resnext50_MAXN.txt log_retinanet_resnet34_MAXN.txt log_ssd_resnet34_MAXN.txt log_resnet50_MAXN.txt log_ssd_mobilenetv1_MAXN.txt

  • RetinaNet ResNeXt-50: yours is 78 fps, mine is 39 fps
  • RetinaNet ResNet-34: yours is 108 fps, mine is 53 fps
  • SSD-ResNet-34: yours is 83 fps, mine is 41 fps
  • ResNet-50: yours is 2037 fps, mine is 504 qps * 2 (batch) = 1008 fps.
  • SSD-MobileNetV1: yours is 2664 fps, mine is 655 qps * 2 (batch) = 1310 fps.

My results seems to be constantly around half of your reported results. I copy and paste your command lines for execution so I don’t think I’m missing an option here. I double checked that I was on MAXN power mode. I do not understand what I’m missing.

Thanks in advance!

anguloqd avatar Dec 04 '24 12:12 anguloqd

I had the same issue and then noticed it says "2x DLA images per second on a Jetson AGX Orin 64GB" above the table. It is not very clear, but I guess that means they are multiplying by two since there are two DLA cores on the AGX.

jquinn57 avatar Jan 30 '25 03:01 jquinn57

Yes, because tasks on two DLA cores can run in parallel, the performance we show should be equivalent to having two DLA cores working concurrently.

lynettez avatar Sep 19 '25 03:09 lynettez