Open3D-ML Different values for accuracy (validation and testing)

Checklist

[X] I have searched for similar issues.
[X] I have tested with the latest development wheel.
[X] I have checked the release documentation and the latest documentation (for master branch).

Describe the issue

I'm trying to train point transformer on a dataset based on S3DIS. in the last epoch of the training process, I get a value for accuracy and miou as follows:

INFO - 2022-03-11 10:36:05,257 - semantic_segmentation - Mean acc train: 0.950 eval: 0.933 INFO - 2022-03-11 10:36:05,258 - semantic_segmentation - Mean IoU train: 0.901 eval: 0.902

But when I test my model on the same sample, I get new values for accuracy and miou: Overall Testing Accuracy : 0.6765742520464817, mIoU : 0.6481393184649503

I'm using pipeline.run_test() to test my model.

I think the value of accuracy is being calculated differently in these stages.

Is there any solution to this problem?

Thank you!

Steps to reproduce the bug

model = ml3d.models.PointTransformer(**cfg.model)
pipeline = ml3d.pipelines.SemanticSegmentation(model=model, dataset=dataset, device="gpu", **cfg.pipeline)
pipeline.run_train()

# the following lines give the value of accuracy and MIoU in the last epoch:
pipeline.metric_val.acc()
pipeline.metric_val.iou()

# this line of code gives different values of accuracy and MIoU on the same sample of data 
pipeline.run_test()

Error message

No response

Expected behavior

the value of acc and MIoU in the last epoch should be equal to the values coming from the function run_test since the same sample for validation and testing has been used.

Mar 11 '22 11:03 MSaeedMp

We also found that the test result on S3DIS using RandLA pipeline.run_test() has a low score 0.3 (supposed to be 0.7). When visualized, the segmentation result is not so great, so maybe its not just the metrics.

INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Per class Accuracy : [0.953568504773729, 0.9935140520012268, 0.538902054405083, 0.06611752343180624, 0.014058350140783193, 0.0, 0.15324665441354515, 0.02713042148885172, 0.026510953703274522, 0.028663746527722873, 0.42685603261910626, 0.08113456464379948, 0.864360417519919] INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Per class IOUs : [0.936987193881234, 0.8461924016427381, 0.47117332327507927, 0.06446300111266566, 0.013652179694598967, 0.0, 0.15159084558047756, 0.026792406032470087, 0.02648010151792873, 0.02860750170971645, 0.20241000751505328, 0.08050719112624422, 0.30335603473114814] INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Overall Accuracy : 0.321 INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Overall IOU : 0.242

Mar 14 '22 15:03 lintong-zhang

I have a similar issue using RandLA-Net. I am running a custom dateset, with great training results ~92% accuracy and ~70ish MioU. When I attempt any form of testing, even on my validation or training data the results are not at all representative of what the training would suggest.

Apr 22 '22 09:04 maxh13

Same issue here as well. Trained RandLA-Net on a custom dataset, where the training achieves OA ~89% and mIOU ~0.82 on the train set, and OA ~75% and mIOU ~0.65 on the validation set. However, these metrics drop significantly when I use the run_test() function. I tested on the validation set and instead of the expected performance that I saw during training, I get OA ~35% and mIOU ~0.25.

Sep 08 '22 15:09 vvaibhav08

Hello - I have a similar problem on a custom dataset, then not only with the RandLA-Net. Anyone found a likely cause for this? What is that you explored in the meantime? (I've seen that the last message dates back to Sept. 22 - did you find a way out for this?). Thank you in advance!

Mar 25 '23 09:03 wmoonro

Same problem here using KPConv trained with PyTorch on a custom dataset!

Mar 13 '24 12:03 borgarpa