Training data for NYU dataset
I find that you use all three views of depth images for training in the \denseReg-master\data\nyu.py . def loadAnnotation(self, is_trun=False): '''is_trun: True: to load 14 joints from self.keep_list False: to load all joints ''' t1 = time.time() path = os.path.join(self.src_dir, 'joint_data.mat') mat = sio.loadmat(path) camera_num = 1 if self.subset=='testing' else 3 joints = [mat['joint_xyz'][idx] for idx in range(camera_num)] names = [['depth_{}_{:07d}.png'.format(camera_idx+1, idx+1) for idx in range(len(joints[camera_idx]))] for camera_idx in range(camera_num)]
But for fair comparison only view 1 images should be used for training.
thanks a lot for pointing this out. Actually we've only used first view for training. See as another part of the code in dataset.py/nyu line 64, where only first 1/3 data are fed. I leave this interface available in case for ease of other usage.
Thank you for your answer.
I am a little bit confused about the depth normalization processing, why is the hand depth values range from com[2]-D_RANDGE to com[2]+D_RANGE*0.5? The corresponding code is as follows:
def norm_dm(dms, coms):
def fn(elems):
dm, com = elems[0], elems[1]
max_depth = com[2]+D_RANGE*0.5
min_depth = com[2]-D_RANGE*0.5
mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))
normed_dm = tf.where(mask, tf.divide(dm-min_depth, D_RANGE), -1.0*tf.ones_like(dm))
return [normed_dm, com]
norm_dms, _ = tf.map_fn(fn, [dms, coms])
return norm_dms
I think the hand depth values should range from com[2]-D_RANGE0.5 to com[2]+D_RANGE0.5. But the provided code is as follows: mask = tf.logical_and(tf.less(dm, max_depth), tf.greater(dm, min_depth-D_RANGE*0.5))
these are just some trial and error hacky stuffs.
For ICVL and MSRA datasets, the cropped image from testing set is also obtained by exploiting the ground truth pose, I think it is inappropriate.
msra provides bbx as starting point. It is very easy to crop out hand from icvl with heuristics, eg depth thresholding.