denseReg NYU test results not good as it should be

After download nyu dataset and your pretrained model, I run the command python model/hourglass_um_crop_tiny.py --dataset 'nyu' --batch_size 3 --num_stack 2 --num_fea 128 --debug_level 2 with is_train equals False in the code. But the results is not good as it should be:

[2018-04-19 16:06:11.933652]: 2727/393226 computed, with 0.10s
finish test
10mm percentage: 0.012239
20mm percentage: 0.057683
30mm percentage: 0.072225
40mm percentage: 0.095613

This is what i see in tensorboard:

Ground Truth: snipaste_2018-04-19_16-14-50 Val: snipaste_2018-04-19_16-15-07

And this is the generating result files: testing-2018-04-19_15_59_22.194820-result.txt testing-2018-04-19_15_59_22.194820-result_error.txt

What's the problem? Am I wrong with the code?

My environment:

Ubuntu 16.04
GTX titan xp 12GB
cuda9.0 + cudnn7
python 2.7
tensorflow 1.7
tfplot 0.2.0
matplotlib 2.2.2
opencv 2.4.11

Apr 19 '18 08:04 MeteorsHub

Hi, I've downloaded the code and model, run again and the result is as expected. According to the tensorboard visualization, there could be something wrong with the preprocessing step, and some code has been changed.

Apr 19 '18 08:04 melonwan

You mean there is something wrong with data/nyu.py? I just run this file after downloading NYU dataset

Apr 19 '18 09:04 MeteorsHub

No, the original code should run correctly. Sorry for my ambiguious explaination, I just mean by some original code might be changed on your side which makes the difference.

Apr 19 '18 09:04 melonwan

I cloned your code again and run from begining, but it came to the same results. I think you should solve the issue #4 so that we can evaluate dataset icvl and msra to see if it is good for them

Apr 19 '18 13:04 MeteorsHub

that's pretty strange as I've also clone and run the code this morning and it shows correct result. Are you sure you follow the same step and with no code change to prepare the tf-record file?

Apr 19 '18 13:04 melonwan

Yes, i run it several time and made sure the code I modified does nothing to do with data and network. Maybe it's my environment or dataset corruption. So i will use another PC to rerun and check other two dataset.

Apr 19 '18 13:04 MeteorsHub

I run the code with ICVL and the result is normal as expected. I don't know why it's strange on NYU. I found images on tensorboard has a strange vertical line, is the same as when you run? Thanks a lot for your effort!

Another code modification suggestion: delete line 51 in data/icvl.py return True in is_train() or it will not load test data

Apr 20 '18 05:04 MeteorsHub

I got it. Your code on bbox selection have a bug. Maybe your nyu_bbox.pkl is different from the uploaded one Change line 107-121 in data/nyu.py to

if self.subset == 'testing':
    with open('data/nyu_bbx.pkl', 'rb') as f:
        bbxes = [cPickle.load(f)]

self._annotations = []
if self.subset == 'testing':
    for c_j, c_n, c_b in zip(joints, names, bbxes):
        for j, n, b in zip(c_j, c_n, c_b):
            j = j.reshape((-1,3))
            j[:,1] *= -1.0
            j = j.reshape((-1,))
                if is_trun:
                     j = j[self.keep_pose_idx]
                b = np.asarray(b).reshape((-1,))
                self._annotations.append(Annotation(n, j.reshape((-1,)), b))

Apr 21 '18 12:04 MeteorsHub

Hi, thank you for telling me that. Actually, this is not a "bug", but an implementation trick. We only estimated the bounding box of view 1, which is also the view to do evaluation, and set the rest view with the same bounding box, which is not the correct one, but can obtain the same result. I've run on my side with exactly the same code online.

Apr 21 '18 19:04 melonwan

I don't understand. Is ‘view’ a camera view or one frame? If a camera view (there are 3 camera views in test set), you code don’t evaluate camera view 2 and 3. If a frame, you code use box of frame 1 to crop all the frames left. But apparently it will not crop the correct hand locations because hand will move in the next frames.

Apr 22 '18 04:04 MeteorsHub

Hey, 'view' means camera view. Yes, the code only evaluate view 1, as this is the standard evaluation protocol.

Apr 22 '18 07:04 melonwan

So, you only take bbox frame 1 of view 1 to do all frames cropping of that view in your code. That's what i think inaccurate.

Apr 22 '18 10:04 MeteorsHub

no, it is bbx of view 1, preframe, corresponds to view1, 2, 3 bbx, which is wrong for view 2, 3, but works correctly on view1, which is the evaluation view.

Apr 23 '18 12:04 melonwan

I mean in your original code

for c_j, c_n, c_b in zip(joints, names, bbxes):
    for j, n in zip(c_j, c_n):

len(joints)=1, len(names)=1, len(bbxes)=8252. If using zip(), you will not get bbxes of frame 2-8252. So i change your code as mentioned above.

Apr 23 '18 12:04 MeteorsHub

Thanks a lot for informing me about this. I'll update accordingly.

Apr 23 '18 14:04 melonwan

Have you ever tried RGB-D input? I mean changing input of [B, H, W, 1] to [B, H, W, 4] and train your model. Of course it will increase computation, but will this increase the accuracy?

Apr 23 '18 15:04 MeteorsHub

This idea is great. Actually I've tried with the similar idea, to use hand color to crop out hand. Unfortunately, rgb and d is not calibrated and well-aligned for the nyu dataset.

Apr 23 '18 16:04 melonwan