Mismatch performance on CelebAMask-HQ test set

Open ChaoLi977 opened this issue 11 months ago • 0 comments

Hello, thank you for your great work.

But I run the face parsing model on the CelebAMask-HQ test set (2824 images) and got the below performance.

The model I run is the farl/celebm/448, face_parsing.farl.celebm.main_ema_181500_jit.pt. F1 scores: {'background': 0.9343307778499743, 'skin': 0.9641438432481969, 'nose': 0.9377685027511485, 'eye_g': 0.8991579940116652, 'l_eye': 0.8797685119013225, 'r_eye': 0.8815088490017493, 'l_brow': 0.8546936399701022, 'r_brow': 0.8517906024905171, 'l_ear': 0.8826971414311515, 'r_ear': 0.8796045818209585, 'mouth': 0.9227481788076385, 'u_lip': 0.8879356316268103, 'l_lip': 0.9040920760745508, 'hair': 0.935249390735524, 'hat': 0.8693470068443545, 'ear_r': 0.697250254530866, 'neck_l': 0.3732396631852335, 'neck': 0.8658552106253891, 'cloth': 0.8273804800814614, 'fg_mean': 0.8507906421743688}

It seems the mean f is 85.07, which is not match with the 89.56 reported in the paper.

Moreover, the necklace performance is very low, 37.32, which is much smaller than the 69.72 in paper.

Can you help me to figure out the reason?

Mar 14 '25 04:03 ChaoLi977