DeepGlobalRegistration Index out of bounds

Hi,

Thanks for sharing the code.

I'm trying to train DGR on my own dataset. So I made a dataloader which returns the same format as other loaders in this repo. For example, I printed what getitem returns right before its returning line like this

print('u0: {}, u1: {}, c0: {}, c1: {}, f0: {}, f1: {}, m: {}, trans: {}'.format(unique_xyz0_th.shape, unique_xyz1_th.shape, coords0.shape, coords1.shape, feats0.shape, feats1.shape, len(matches), trans.shape))

        return (unique_xyz0_th.float(),
                unique_xyz1_th.float(), coords0.int(), coords1.int(), feats0.float(),
                feats1.float(), matches, trans, extra_package)

and this is its example output

u0: torch.Size([279, 3]), u1: torch.Size([281, 3]), c0: torch.Size([279, 3]), c1: torch.Size([281, 3]), f0: torch.Size([279, 1]), f1: torch.Size([281, 1]), m: 46745, trans: (4, 4)
u0: torch.Size([900, 3]), u1: torch.Size([859, 3]), c0: torch.Size([900, 3]), c1: torch.Size([859, 3]), f0: torch.Size([900, 1]), f1: torch.Size([859, 1]), m: 4696, trans: (4, 4)
u0: torch.Size([1159, 3]), u1: torch.Size([1153, 3]), c0: torch.Size([1159, 3]), c1: torch.Size([1153, 3]), f0: torch.Size([1159, 1]), f1: torch.Size([1153, 1]), m: 298974, trans: (4, 4)
u0: torch.Size([2092, 3]), u1: torch.Size([2048, 3]), c0: torch.Size([2092, 3]), c1: torch.Size([2048, 3]), f0: torch.Size([2092, 1]), f1: torch.Size([2048, 1]), m: 587866, trans: (4, 4)

These look similar to what 3DMatch dataset returns.

So I run the training code but it complains with

Traceback (most recent call last):
  File "train.py", line 76, in <module>
    main(config)
  File "train.py", line 55, in main
    trainer.train()
  File "dgr/core/trainer.py", line 135, in train
    self._train_epoch(epoch)
  File "dgr/core/trainer.py", line 237, in _train_epoch
    weights=weights)
  File "dgr/core/trainer.py", line 591, in weighted_procrustes
    X=xyz0[pred_pair[:, 0]].to(self.device),
IndexError: index 588 is out of bounds for dimension 0 with size 588

To see what it means, I also printed xyz0, xyz1, and pred_pair in core/trainer.py like this

  def weighted_procrustes(self, xyz0s, xyz1s, pred_pairs, weights):
    decomposed_weights = self.decompose_by_length(weights, pred_pairs)
    RT = []
    ws = []

    for xyz0, xyz1, pred_pair, w in zip(xyz0s, xyz1s, pred_pairs, decomposed_weights):
      xyz0.requires_grad = False
      xyz1.requires_grad = False
      ws.append(w.sum().item())
      print('in trainer, xyz0: {}, xyz1: {}, pred_pair: {}'.format(xyz0.shape, xyz1.shape, pred_pair))
      predT = GlobalRegistration.weighted_procrustes(
          X=xyz0[pred_pair[:, 0]].to(self.device),
          Y=xyz1[pred_pair[:, 1]].to(self.device),
          w=w,
          eps=np.finfo(np.float32).eps)
      RT.append(predT)

and this is what I got

in trainer, xyz0: torch.Size([1201, 3]), xyz1: torch.Size([1178, 3]), pred_pair: tensor([[  0,  23],
        [  1,   5],
        [  2,   5],
        ...,
        [585, 531],
        [586, 532],
        [587, 533]])
in trainer, xyz0: torch.Size([588, 3]), xyz1: torch.Size([569, 3]), pred_pair: tensor([[   0,  998],
        [   1,  948],
        [   2,   14],
        ...,
        [1188, 1167],
        [1189, 1166],
        [1190, 1072]])

For me, it seems like somehow the pred_pair is swapped since the first pred_pair has indices up to 587 which is the size of the xyz0 in the second.

I verified that I can run the training code of 3DMatch for a while. Do you have an idea of why this error is happening?

Best,

Sep 18 '20 01:09 dorucia

For more information, it stochastically throws a different type of error like this one.

Traceback (most recent call last):
  File "train.py", line 76, in <module>
    main(config)
  File "train.py", line 55, in main
    trainer.train()
  File "dgr/core/trainer.py", line 135, in train
    self._train_epoch(epoch)
  File "dgr/core/trainer.py", line 241, in _train_epoch
    rot_error = batch_rotation_error(pred_rots, gt_rots)
  File "dgr/core/metrics.py", line 32, in batch_rotation_error
    assert len(rots1) == len(rots2)
AssertionError

My environment is PyTorch 1.5.0, CUDA 10.1.243, python 3.7, ubuntu 18.04, and installed gcc7 as shown in Readme

Sep 18 '20 01:09 dorucia

Found a bug or an issue

For the CollationFunctionFactory in the base_loader.py, right before collate_pair_fn returns, I printed xyz0, xyz1, and len_batch like this

for x0, x1, lens in zip(xyz0, xyz1, len_batch):
        print('collate xyz0 {}, xyz1 {}, lenb {}'.format(x0.shape, x1.shape, lens))

    return {
        'pcd0': xyz0,
        'pcd1': xyz1,
        'sinput0_C': coords_batch0,
        'sinput0_F': feats_batch0,
        'sinput1_C': coords_batch1,
        'sinput1_F': feats_batch1,
        'correspondences': matching_inds_batch,
        'T_gt': trans_batch,
        'len_batch': len_batch,
        'extra_packages': extra_packages,
    }

and the output is

collate xyz0 torch.Size([473, 3]), xyz1 torch.Size([473, 3]), lenb [473, 473]
collate xyz0 torch.Size([412, 3]), xyz1 torch.Size([414, 3]), lenb [412, 414]
collate xyz0 torch.Size([304, 3]), xyz1 torch.Size([298, 3]), lenb [459, 463]
collate xyz0 torch.Size([459, 3]), xyz1 torch.Size([463, 3]), lenb [411, 407]
collate xyz0 torch.Size([411, 3]), xyz1 torch.Size([407, 3]), lenb [402, 398]
collate xyz0 torch.Size([402, 3]), xyz1 torch.Size([398, 3]), lenb [269, 264]
collate xyz0 torch.Size([339, 3]), xyz1 torch.Size([334, 3]), lenb [339, 334]
collate xyz0 torch.Size([427, 3]), xyz1 torch.Size([425, 3]), lenb [427, 425]
collate xyz0 torch.Size([358, 3]), xyz1 torch.Size([362, 3]), lenb [358, 362]
collate xyz0 torch.Size([369, 3]), xyz1 torch.Size([345, 3]), lenb [296, 295]
collate xyz0 torch.Size([296, 3]), xyz1 torch.Size([295, 3]), lenb [335, 313]
collate xyz0 torch.Size([335, 3]), xyz1 torch.Size([313, 3]), lenb [366, 371]

as we can see, the shape of xyz0 and xyz1 does not match with lenb for some lines.

Sep 18 '20 04:09 dorucia

Hi,

I have the same issue, can you please tell me if you solved this and how?

Thanks in advance

Dec 09 '20 20:12 lombardm

I am facing a similar issue as well. How did you solve this?

Mar 25 '21 23:03 pranavgundewar