Different values for accuracy (validation and testing)
Checklist
- [X] I have searched for similar issues.
- [X] I have tested with the latest development wheel.
- [X] I have checked the release documentation and the latest documentation (for
masterbranch).
Describe the issue
I'm trying to train point transformer on a dataset based on S3DIS. in the last epoch of the training process, I get a value for accuracy and miou as follows:
INFO - 2022-03-11 10:36:05,257 - semantic_segmentation - Mean acc train: 0.950 eval: 0.933 INFO - 2022-03-11 10:36:05,258 - semantic_segmentation - Mean IoU train: 0.901 eval: 0.902
But when I test my model on the same sample, I get new values for accuracy and miou: Overall Testing Accuracy : 0.6765742520464817, mIoU : 0.6481393184649503
I'm using pipeline.run_test() to test my model.
I think the value of accuracy is being calculated differently in these stages.
Is there any solution to this problem?
Thank you!
Steps to reproduce the bug
model = ml3d.models.PointTransformer(**cfg.model)
pipeline = ml3d.pipelines.SemanticSegmentation(model=model, dataset=dataset, device="gpu", **cfg.pipeline)
pipeline.run_train()
# the following lines give the value of accuracy and MIoU in the last epoch:
pipeline.metric_val.acc()
pipeline.metric_val.iou()
# this line of code gives different values of accuracy and MIoU on the same sample of data
pipeline.run_test()
Error message
No response
Expected behavior
the value of acc and MIoU in the last epoch should be equal to the values coming from the function run_test since the same sample for validation and testing has been used.
We also found that the test result on S3DIS using RandLA pipeline.run_test() has a low score 0.3 (supposed to be 0.7). When visualized, the segmentation result is not so great, so maybe its not just the metrics.
INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Per class Accuracy : [0.953568504773729, 0.9935140520012268, 0.538902054405083, 0.06611752343180624, 0.014058350140783193, 0.0, 0.15324665441354515, 0.02713042148885172, 0.026510953703274522, 0.028663746527722873, 0.42685603261910626, 0.08113456464379948, 0.864360417519919] INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Per class IOUs : [0.936987193881234, 0.8461924016427381, 0.47117332327507927, 0.06446300111266566, 0.013652179694598967, 0.0, 0.15159084558047756, 0.026792406032470087, 0.02648010151792873, 0.02860750170971645, 0.20241000751505328, 0.08050719112624422, 0.30335603473114814] INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Overall Accuracy : 0.321 INFO - 2022-03-14 14:41:35,236 - semantic_segmentation - Overall IOU : 0.242
I have a similar issue using RandLA-Net. I am running a custom dateset, with great training results ~92% accuracy and ~70ish MioU. When I attempt any form of testing, even on my validation or training data the results are not at all representative of what the training would suggest.
Same issue here as well. Trained RandLA-Net on a custom dataset, where the training achieves OA ~89% and mIOU ~0.82 on the train set, and OA ~75% and mIOU ~0.65 on the validation set. However, these metrics drop significantly when I use the run_test() function. I tested on the validation set and instead of the expected performance that I saw during training, I get OA ~35% and mIOU ~0.25.
Hello - I have a similar problem on a custom dataset, then not only with the RandLA-Net. Anyone found a likely cause for this? What is that you explored in the meantime? (I've seen that the last message dates back to Sept. 22 - did you find a way out for this?). Thank you in advance!
Same problem here using KPConv trained with PyTorch on a custom dataset!