mvdfusion icon indicating copy to clipboard operation
mvdfusion copied to clipboard

Code for Multiview RGB-D Reconstruction

Open MiaApr opened this issue 1 year ago • 3 comments

Hi, thanks for your great work!
I tried to reconstruct the 3D point cloud by directly unprojecting the foreground pixels using the predicted multiview RGB-D images, but I couldn’t get satisfactory results without post-processing. Could you please release the code for your multiview RGB-D reconstruction?

MiaApr avatar Sep 20 '24 03:09 MiaApr

Hi!

We perform two post-processing methods. First we mask the pixels we unproject via a threshold on the RGB image. Second we remove outliers, below is a sample function.

def filter_point_cloud(pts):

    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(pts)
    downpcd = pcd.voxel_down_sample(voxel_size=0.01)
    downpcd, ind = downpcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)
    return np.asarray(downpcd.points, np.float32)

zhizdev avatar Oct 31 '24 16:10 zhizdev

Thank you for your response. I’ve unprojected the point clouds as you suggested like this:

depth_map = depth_map * mask
camera = PerspectiveCameras(focal_length=((2.1875, 2.1875),),
                            principal_point=((0,0),),
                            image_size=image_size,
                            R=R,
                            T=T,)
grid = torch.meshgrid(torch.arange(image_size[0]), torch.arange(image_size[1]))
grid = torch.stack(grid, dim=-1).float()

non_zero_mask = depth_map > 0
xy_masked = grid[non_zero_mask]
z_masked = depth_map[non_zero_mask]
xyz = torch.cat([xy_masked, z_masked.unsqueeze(-1)], dim=-1)
world_points = camera.unproject_points(xyz, world_coordinates=True)

pcds = o3d.geometry.PointCloud()
pcds.points = o3d.utility.Vector3dVector(world_points)

downpcd = pcds.voxel_down_sample(voxel_size=0.01)
downpcd, ind = downpcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=2.0)

But I'm still not getting an ideal result. The depth scale seems off. Did you apply any scale and shift transformations on the predicted depth map?

MiaApr avatar Feb 20 '25 03:02 MiaApr

If you are loading depth from .png [0-1], then you would need to unscale the values. If you are using pred_depth from model_output[:,4:,...] as in test.py, you would need to unnormalize it from [-1,1] to [0,1]. The Objaverse depths values should range from [0.5, 2.5].

def _unscale_depth( depths):
    '''
    Rescale depth from [0, 1] to [0.5, 2.5]
    '''
    shift = 0.5
    scale = 2.0
    depths = depths * scale + shift
    return depths

def unnormalize(x):
    '''
    Unnormalize [-1, 1] to [0, 1]
    '''
    return torch.clip((x + 1.0) / 2.0, 0.0, 1.0)

zhizdev avatar Feb 21 '25 07:02 zhizdev