TransUNet icon indicating copy to clipboard operation
TransUNet copied to clipboard

Evaluation metric

Open Mordokkai opened this issue 4 years ago • 4 comments

Hi,

Thanks a lot for sharing your code. I have a question regarding the computation of the evaluation metrics. Your code is:

def calculate_metric_percase(pred, gt):  
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    if pred.sum() > 0 and gt.sum()>0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum()==0:
        return 1, 0
    else:
        return 0, 0

I get the first "if" condition because the hausdorff distance requires groundtruth and predictions to be defined. However I don't get why you set the dice score to 1, when predictions is not empty and groundtruth empty. From the dice formula, the score should be 0 in that case. And it's the same when predictions is empty and ground truth not empty (and that you did). Concerning the hausdorff distance, it doesn't seem right to me to set a null distance when prediction or groundtruth doesn't exist, cause it should be considered as a mistake and not be taken into account in the evaluation metric. What do you think about that ? If I didn't understand your code correctly, sorry in advance

Mordokkai avatar Apr 15 '21 16:04 Mordokkai

Hi,

Thanks a lot for sharing your code. I have a question regarding the computation of the evaluation metrics. Your code is:

def calculate_metric_percase(pred, gt):  
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    if pred.sum() > 0 and gt.sum()>0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum()==0:
        return 1, 0
    else:
        return 0, 0

I get the first "if" condition because the hausdorff distance requires groundtruth and predictions to be defined. However I don't get why you set the dice score to 1, when predictions is not empty and groundtruth empty. From the dice formula, the score should be 0 in that case. And it's the same when predictions is empty and ground truth not empty (and that you did). Concerning the hausdorff distance, it doesn't seem right to me to set a null distance when prediction or groundtruth doesn't exist, cause it should be considered as a mistake and not be taken into account in the evaluation metric. What do you think about that ? If I didn't understand your code correctly, sorry in advance

I have the same question. If pred.sum()==0 and gt.sum()>0, this code set hd to 0. When calculating the mean hd, it means the pred result is very good which is contrary to the fact.

hwei-cs avatar May 10 '21 10:05 hwei-cs

Hi,

Thanks a lot for sharing your code. I have a question regarding the computation of the evaluation metrics. Your code is:

def calculate_metric_percase(pred, gt):  
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    if pred.sum() > 0 and gt.sum()>0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum()==0:
        return 1, 0
    else:
        return 0, 0

I get the first "if" condition because the hausdorff distance requires groundtruth and predictions to be defined. However I don't get why you set the dice score to 1, when predictions is not empty and groundtruth empty. From the dice formula, the score should be 0 in that case. And it's the same when predictions is empty and ground truth not empty (and that you did). Concerning the hausdorff distance, it doesn't seem right to me to set a null distance when prediction or groundtruth doesn't exist, cause it should be considered as a mistake and not be taken into account in the evaluation metric. What do you think about that ? If I didn't understand your code correctly, sorry in advance

By the way, when pred.sum()==0 and gt.sum()>0 has occurred, what do you think is better to deal with this situation?

hwei-cs avatar May 10 '21 10:05 hwei-cs

Hi, Thanks a lot for sharing your code. I have a question regarding the computation of the evaluation metrics. Your code is:

def calculate_metric_percase(pred, gt):  
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    if pred.sum() > 0 and gt.sum()>0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum()==0:
        return 1, 0
    else:
        return 0, 0

I get the first "if" condition because the hausdorff distance requires groundtruth and predictions to be defined. However I don't get why you set the dice score to 1, when predictions is not empty and groundtruth empty. From the dice formula, the score should be 0 in that case. And it's the same when predictions is empty and ground truth not empty (and that you did). Concerning the hausdorff distance, it doesn't seem right to me to set a null distance when prediction or groundtruth doesn't exist, cause it should be considered as a mistake and not be taken into account in the evaluation metric. What do you think about that ? If I didn't understand your code correctly, sorry in advance

By the way, when pred.sum()==0 and gt.sum()>0 has occurred, what do you think is better to deal with this situation?

I also found this problem. I modified the following code according to the rules of BraTS online evaluation:

def calculate_metric_percase(pred, gt):
    pred[pred > 0] = 1
    gt[gt > 0] = 1
    if pred.sum() > 0 and gt.sum() > 0:
        dice = metric.binary.dc(pred, gt)
        hd95 = metric.binary.hd95(pred, gt)
        return dice, hd95
    elif pred.sum() > 0 and gt.sum() == 0:
        return 0, 373.128664
    elif pred.sum() == 0 and gt.sum() > 0:
        return 0, 373.128664
    elif pred.sum() == 0 and gt.sum() == 0:
        return 1, 0

tsaiwentage avatar Aug 11 '21 08:08 tsaiwentage

@HeyWhale8 Personnally, when pred.sum()==0 and gt.sum()>0 I just put nan as a metric and ignore it during my average. But besides my average result, I also put the percentage of "ignored" results, cause it's still really important. I just didn't find any better way to deal with that problem. @tsaiwentage ah ok, i didn't know some people put an arbitrary large value, I guess this value depends on the context of the segmentation. Anyway, I may be wrong, but I have the feeling that its way of computing the metrics may improve a lots its results. This should be fixed I think

Mordokkai avatar Aug 21 '21 21:08 Mordokkai