Evaluating AHIQ on traditional IQA datasets

Open ch-andrei opened this issue 3 years ago • 1 comments

Question for the results in the original paper: How is the evaluation on traditional datasets (LIVE, CSIQ, TID) performed? Do you report average performance over K runs? The paper only mentions that datasets are split 60-20-20 train/val/test. Please add a more detailed description.

Jun 20 '22 18:06 ch-andrei

I have the same question, especially when I test the whole TID2013 dataset with the code 'test.py'. The results are totally unsatisfactory and the PLCC, and SRCC scores are very low. Can anyone teach me how to correctly test the results?

Aug 17 '23 10:08 Buka-Xing