The issue of poor performance in Domain D and the ASSD metric of the MMS dataset.
Dear Author,
I am very interested in your work and am currently trying to replicate your results. However, I have encountered some issues and would like to seek your advice. In my experiments with the MMS dataset, the DICE scores for Domains B and C are quite close to the data reported in your paper, with an average difference of less than three percentage points for each metric. However, the ASSD metric shows a significant discrepancy compared to your results. Additionally, the performance in Domain D is notably worse compared to Domains B and C.
Could you please provide any suggestions on how to address these issues? My experimental results are as follows:
Thank you for taking the time to respond. Wishing you continued success in your research!
I seem to have found the problem. The model in the original test function of test_run.py is set to train mode. When I changed it to eval mode, the result was significantly improved