denseReg icon indicating copy to clipboard operation
denseReg copied to clipboard

Training data for NYU dataset

Open fanglinpu opened this issue 7 years ago • 6 comments

I find that you use all three views of depth images for training in the \denseReg-master\data\nyu.py . def loadAnnotation(self, is_trun=False): '''is_trun: True: to load 14 joints from self.keep_list False: to load all joints ''' t1 = time.time() path = os.path.join(self.src_dir, 'joint_data.mat') mat = sio.loadmat(path) camera_num = 1 if self.subset=='testing' else 3 joints = [mat['joint_xyz'][idx] for idx in range(camera_num)] names = [['depth_{}_{:07d}.png'.format(camera_idx+1, idx+1) for idx in range(len(joints[camera_idx]))] for camera_idx in range(camera_num)]

But for fair comparison only view 1 images should be used for training.

fanglinpu avatar Aug 03 '18 03:08 fanglinpu

thanks a lot for pointing this out. Actually we've only used first view for training. See as another part of the code in dataset.py/nyu line 64, where only first 1/3 data are fed. I leave this interface available in case for ease of other usage.

melonwan avatar Aug 03 '18 04:08 melonwan

Thank you for your answer.

fanglinpu avatar Aug 03 '18 07:08 fanglinpu

I am a little bit confused about the depth normalization processing, why is the hand depth values range from com[2]-D_RANDGE to com[2]+D_RANGE*0.5? The corresponding code is as follows:

def norm_dm(dms, coms):
    def fn(elems):
        dm, com = elems[0], elems[1]
        max_depth = com[2]+D_RANGE*0.5
        min_depth = com[2]-D_RANGE*0.5
        mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))
        normed_dm = tf.where(mask, tf.divide(dm-min_depth, D_RANGE), -1.0*tf.ones_like(dm))
        return [normed_dm, com]

    norm_dms, _ = tf.map_fn(fn, [dms, coms])

    return norm_dms

I think the hand depth values should range from com[2]-D_RANGE0.5 to com[2]+D_RANGE0.5. But the provided code is as follows: mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))

fanglinpu avatar Aug 03 '18 12:08 fanglinpu

these are just some trial and error hacky stuffs.

melonwan avatar Aug 05 '18 00:08 melonwan

For ICVL and MSRA datasets, the cropped image from testing set is also obtained by exploiting the ground truth pose, I think it is inappropriate.

fanglinpu avatar Aug 06 '18 01:08 fanglinpu

msra provides bbx as starting point. It is very easy to crop out hand from icvl with heuristics, eg depth thresholding.

melonwan avatar Aug 07 '18 06:08 melonwan