Extremely low mAP for more classes
It seemed to give an extremely low result for mAP values (eg. 0.0123) when using more than 1 classes. How is this explained?
By the way, I figured out that when changing the recall threshold to 0.5, the results are normal making sense on the performance (e.g. 0.123).
Any suggestions and explanations would be welcome.
@alexandoikon13 what is values of per-class AP by mAP = 0.0123? Maybe detector works too bad for some classes ...
@bes-dev Can't really remember when experimented around with the given dataset from the tutorial example. The value of 0.0123 was a random value indicating the number of decimals in the resulted value of mAP.
So, library works correct for more than one class at my side. If you have reproducer with some errors related to multiclass mAP, please provide it.
One example of 4 classes is the following, where COCO mAP is much lower than the VOC PASCAL mAP. Also, doesn't make sense to me to be that low.
`
[xmin, ymin, xmax, ymax, class_id, difficult, crowd]
gt = np.array([ [439, 157, 556, 241, 0, 0, 0], [437, 246, 518, 351, 1, 0, 0], [515, 306, 595, 375, 2, 0, 0], [407, 386, 531, 476, 1, 0, 0], [544, 419, 621, 476, 0, 0, 0], [609, 297, 636, 392, 3, 0, 0] ])
[xmin, ymin, xmax, ymax, class_id, confidence]
preds = np.array([ [429, 219, 528, 247, 2, 0.860851], [433, 260, 506, 336, 1, 0.769833], [518, 314, 603, 369, 0, 0.662608], [592, 310, 634, 388, 3, 0.798196], [403, 384, 517, 461, 0, 0.982881], [405, 429, 519, 470, 0, 0.669369], [433, 272, 499, 341, 1, 0.772826], [413, 390, 515, 459, 2, 0.619459] ])
print list of available metrics
print(MetricBuilder.get_metrics_list())
create metric_fn
metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=4) metric_fn.add(preds, gt)
compute PASCAL VOC metric
print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")
compute PASCAL VOC metric at the all points
print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")
compute metric COCO metric
print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")
Results
VOC PASCAL mAP: 0.3181818127632141 VOC PASCAL mAP in all points: 0.3125 COCO mAP: 0.03762376308441162 `
###########################################################
Another example with 6 classes and more ground_truth and predicted bboxes: `
[xmin, ymin, xmax, ymax, class_id, difficult, crowd]
gt = np.array([ [439, 157, 556, 241, 0, 0, 0], [437, 246, 518, 351, 1, 0, 0], [515, 306, 595, 375, 2, 0, 0], [407, 386, 531, 476, 1, 0, 0], [544, 419, 621, 476, 0, 0, 0], [609, 297, 636, 392, 3, 0, 0], [234, 562, 321, 543, 4, 0, 0], [456, 613, 632, 512, 4, 0, 0], [333, 444, 444, 333, 5, 0, 0], [549, 401, 608, 399, 5, 0, 0], [419, 389, 509, 419, 4, 0, 0], [511, 388, 592, 168, 3, 0, 0] ])
[xmin, ymin, xmax, ymax, class_id, confidence]
preds = np.array([ [429, 219, 528, 247, 2, 0.860851], [433, 260, 506, 336, 1, 0.769833], [518, 314, 603, 369, 0, 0.662608], [592, 310, 634, 388, 3, 0.798196], [403, 384, 517, 461, 0, 0.982881], [405, 429, 519, 470, 0, 0.669369], [433, 272, 499, 341, 1, 0.772826], [413, 390, 515, 459, 2, 0.619459], [418, 401, 551, 459, 5, 0.719459], [332, 401, 414, 498, 5, 0.819459], [301, 390, 345, 435, 4, 0.519459], [543, 601, 521, 681, 4, 0.919459], [389, 390, 498, 476, 2, 0.769459], [589, 452, 619, 524, 3, 0.879459], [418, 345, 501, 410, 0, 0.909459], [482, 476, 517, 589, 3, 0.669459] ])
print list of available metrics
print(MetricBuilder.get_metrics_list())
create metric_fn
metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=2) metric_fn.add(preds, gt)
compute PASCAL VOC metric
print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")
compute PASCAL VOC metric at the all points
print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")
compute metric COCO metric
print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")
Results
VOC PASCAL mAP: 0.13636364042758942 VOC PASCAL mAP in all points: 0.125 COCO mAP: 0.025247525423765182 `
Hope that helps to my question. Also, why do yo use a range of Recall thresholds?
I experience a similar issues. It seems that the evaluators presets AP values for classes that are empty (no in gt or pred?) to 0. That factors in the mean.