mean_average_precision icon indicating copy to clipboard operation
mean_average_precision copied to clipboard

Extremely low mAP for more classes

Open alexandoikon13 opened this issue 4 years ago • 5 comments

It seemed to give an extremely low result for mAP values (eg. 0.0123) when using more than 1 classes. How is this explained?

By the way, I figured out that when changing the recall threshold to 0.5, the results are normal making sense on the performance (e.g. 0.123).

Any suggestions and explanations would be welcome.

alexandoikon13 avatar Aug 05 '21 12:08 alexandoikon13

@alexandoikon13 what is values of per-class AP by mAP = 0.0123? Maybe detector works too bad for some classes ...

bes-dev avatar Aug 05 '21 12:08 bes-dev

@bes-dev Can't really remember when experimented around with the given dataset from the tutorial example. The value of 0.0123 was a random value indicating the number of decimals in the resulted value of mAP.

alexandoikon13 avatar Aug 05 '21 12:08 alexandoikon13

So, library works correct for more than one class at my side. If you have reproducer with some errors related to multiclass mAP, please provide it.

bes-dev avatar Aug 05 '21 13:08 bes-dev

One example of 4 classes is the following, where COCO mAP is much lower than the VOC PASCAL mAP. Also, doesn't make sense to me to be that low.

`

[xmin, ymin, xmax, ymax, class_id, difficult, crowd]

gt = np.array([ [439, 157, 556, 241, 0, 0, 0], [437, 246, 518, 351, 1, 0, 0], [515, 306, 595, 375, 2, 0, 0], [407, 386, 531, 476, 1, 0, 0], [544, 419, 621, 476, 0, 0, 0], [609, 297, 636, 392, 3, 0, 0] ])

[xmin, ymin, xmax, ymax, class_id, confidence]

preds = np.array([ [429, 219, 528, 247, 2, 0.860851], [433, 260, 506, 336, 1, 0.769833], [518, 314, 603, 369, 0, 0.662608], [592, 310, 634, 388, 3, 0.798196], [403, 384, 517, 461, 0, 0.982881], [405, 429, 519, 470, 0, 0.669369], [433, 272, 499, 341, 1, 0.772826], [413, 390, 515, 459, 2, 0.619459] ])

print list of available metrics

print(MetricBuilder.get_metrics_list())

create metric_fn

metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=4) metric_fn.add(preds, gt)

compute PASCAL VOC metric

print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")

compute PASCAL VOC metric at the all points

print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")

compute metric COCO metric

print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")

Results

VOC PASCAL mAP: 0.3181818127632141 VOC PASCAL mAP in all points: 0.3125 COCO mAP: 0.03762376308441162 `

###########################################################

Another example with 6 classes and more ground_truth and predicted bboxes: `

[xmin, ymin, xmax, ymax, class_id, difficult, crowd]

gt = np.array([ [439, 157, 556, 241, 0, 0, 0], [437, 246, 518, 351, 1, 0, 0], [515, 306, 595, 375, 2, 0, 0], [407, 386, 531, 476, 1, 0, 0], [544, 419, 621, 476, 0, 0, 0], [609, 297, 636, 392, 3, 0, 0], [234, 562, 321, 543, 4, 0, 0], [456, 613, 632, 512, 4, 0, 0], [333, 444, 444, 333, 5, 0, 0], [549, 401, 608, 399, 5, 0, 0], [419, 389, 509, 419, 4, 0, 0], [511, 388, 592, 168, 3, 0, 0] ])

[xmin, ymin, xmax, ymax, class_id, confidence]

preds = np.array([ [429, 219, 528, 247, 2, 0.860851], [433, 260, 506, 336, 1, 0.769833], [518, 314, 603, 369, 0, 0.662608], [592, 310, 634, 388, 3, 0.798196], [403, 384, 517, 461, 0, 0.982881], [405, 429, 519, 470, 0, 0.669369], [433, 272, 499, 341, 1, 0.772826], [413, 390, 515, 459, 2, 0.619459], [418, 401, 551, 459, 5, 0.719459], [332, 401, 414, 498, 5, 0.819459], [301, 390, 345, 435, 4, 0.519459], [543, 601, 521, 681, 4, 0.919459], [389, 390, 498, 476, 2, 0.769459], [589, 452, 619, 524, 3, 0.879459], [418, 345, 501, 410, 0, 0.909459], [482, 476, 517, 589, 3, 0.669459] ])

print list of available metrics

print(MetricBuilder.get_metrics_list())

create metric_fn

metric_fn = MetricBuilder.build_evaluation_metric("map_2d", async_mode=True, num_classes=2) metric_fn.add(preds, gt)

compute PASCAL VOC metric

print(f"VOC PASCAL mAP: {metric_fn.value(iou_thresholds=0.5, recall_thresholds=np.arange(0., 1.1, 0.1))['mAP']}")

compute PASCAL VOC metric at the all points

print(f"VOC PASCAL mAP in all points: {metric_fn.value(iou_thresholds=0.5)['mAP']}")

compute metric COCO metric

print(f"COCO mAP: {metric_fn.value(iou_thresholds=np.arange(0.5, 1.0, 0.05), recall_thresholds=np.arange(0., 1.01, 0.01), mpolicy='soft')['mAP']}")

Results

VOC PASCAL mAP: 0.13636364042758942 VOC PASCAL mAP in all points: 0.125 COCO mAP: 0.025247525423765182 `

Hope that helps to my question. Also, why do yo use a range of Recall thresholds?

alexandoikon13 avatar Aug 06 '21 09:08 alexandoikon13

I experience a similar issues. It seems that the evaluators presets AP values for classes that are empty (no in gt or pred?) to 0. That factors in the mean.

michaelyhuang23 avatar Aug 25 '21 05:08 michaelyhuang23