Mismatch performance on CelebAMask-HQ test set
Hello, thank you for your great work.
But I run the face parsing model on the CelebAMask-HQ test set (2824 images) and got the below performance.
The model I run is the farl/celebm/448, face_parsing.farl.celebm.main_ema_181500_jit.pt. F1 scores: {'background': 0.9343307778499743, 'skin': 0.9641438432481969, 'nose': 0.9377685027511485, 'eye_g': 0.8991579940116652, 'l_eye': 0.8797685119013225, 'r_eye': 0.8815088490017493, 'l_brow': 0.8546936399701022, 'r_brow': 0.8517906024905171, 'l_ear': 0.8826971414311515, 'r_ear': 0.8796045818209585, 'mouth': 0.9227481788076385, 'u_lip': 0.8879356316268103, 'l_lip': 0.9040920760745508, 'hair': 0.935249390735524, 'hat': 0.8693470068443545, 'ear_r': 0.697250254530866, 'neck_l': 0.3732396631852335, 'neck': 0.8658552106253891, 'cloth': 0.8273804800814614, 'fg_mean': 0.8507906421743688}
It seems the mean f is 85.07, which is not match with the 89.56 reported in the paper.
Moreover, the necklace performance is very low, 37.32, which is much smaller than the 69.72 in paper.
Can you help me to figure out the reason?