SimIPU icon indicating copy to clipboard operation
SimIPU copied to clipboard

issues about create_data

Open sunnyHelen opened this issue 3 years ago • 21 comments

Hi, thanks for sharing your great work. I encounter some issues during creating data by running create_data.py First create reduced point cloud for training set [ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 24, in kitti_data_prep
kitti.create_reduced_point_cloud(root_path, info_prefix)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 374, in create_reduced_point_cloud
_create_reduced_point_cloud(data_path, train_info_path, save_path)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 314, in _create_reduced_point_cloud
count=-1).reshape([-1, num_features])
ValueError: cannot reshape array of size 461536 into shape (6)

It seems to set the num_features=4 and front_camera_id=2? in this line: https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/kitti_converter.py#L291

I assume doing this can solve the problem but encounter another problem when Create GT Database of KittiDataset
[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 44, in kitti_data_prep
with_bbox=True) # for moca
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database
P0 = np.array(example['P0']).reshape(4, 4)
KeyError: 'P0'

Can you help me figure out how to solve these issues?

sunnyHelen avatar May 10 '22 16:05 sunnyHelen

You should set front_camera_id as 0 for KITTI. https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/kitti_converter.py#L292

:D Since the released codes are only supporting pre-training on KITTI, data preparation is similar to standard mmdet3d. So, you can utilize the standard mmdet3d (correct version introduced in README.md) to run create_data.py and then link the prepared data to the simipu repo.

zhyever avatar May 11 '22 02:05 zhyever

Thank you for your quick reply. when I create GT Database of KittiDataset [ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in out_dir=args.out_dir) File "tools/create_data.py", line 44, in kitti_data_prep with_bbox=True) # for moca File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database P0 = np.array(example['P0']).reshape(4, 4) KeyError: 'P0' https://github.com/zhyever/SimIPU/blob/5b346e392c161a5e9fdde09b1692656bc7cd3faf/tools/data_converter/create_gt_database.py#L275 It seems no P0 key. And there are some different places compared with the mmdet3d one. How should I properly creat the data?

sunnyHelen avatar May 11 '22 14:05 sunnyHelen

Sorry that I missed your problems since I was busy recently. There is a problem with my last answer. You should set front_camera_id=2.

Actually, I recommend that you clone the mmdet3d and utilize the official codes to generate the KITTI dataset. You can directly link the mmdet3d-generated KITTI to the SimIPU repo.

zhyever avatar May 18 '22 12:05 zhyever

Got it. Thanks for your reply.

sunnyHelen avatar May 23 '22 07:05 sunnyHelen

But I encounter a problem when I attempt to conduct Camera-lidar fusion-based 3D object detection on kitti dataset. I follow your instruction to do that: bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

But there is a problem when loading data. Does it seem related to the data label? Could please help me?

Original Traceback (most recent call last): File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem return self.dataset[idx % self._ori_len] File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem data = self.prepare_train_data(idx) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data example = self.pipeline(input_dict) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call data = t(data) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call img=img) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all avoid_coll_boxes_2d) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2 sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], KeyError: 'box2d_camera'

sunnyHelen avatar May 23 '22 07:05 sunnyHelen

Oh, this issue is caused by the key of box2d_camera in dp_sampler. In 'tools/create_data.py', you can find the calling of create_groundtruth_database, which is used to generate the sampled objects for data augment. Since we choose the moca as our baseline method, there are tons of modifications to this ground_database generation function.

Hence, if you create the Kitti dataset via the official mmdet3d codebase, I think you should run the create_groundtruth_database function (comment other lines of code in the kitti_data_prep function) in SimIPU (or Moca) to create the sampled object dataset. If you have created the sampled object dataset via our codes, but there are still these bugs, please report to me and I will have a check. I run the codes before I push this repo to github, so there should have been OK.

zhyever avatar May 23 '22 12:05 zhyever

Thanks a lot. I used the official mmdet3d to create the data label before. I'll follow your instruction to run the create_groundtruth_database function.

sunnyHelen avatar May 26 '22 02:05 sunnyHelen

Hi. I tried to run the create_groundtruth_database function. But it seems we go back to the previous problem:

[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last): File "tools/create_data.py", line 247, in out_dir=args.out_dir) File "tools/create_data.py", line 44, in kitti_data_prep with_bbox=True) # for moca File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database P0 = np.array(example['P0']).reshape(4, 4) KeyError: 'P0'

sunnyHelen avatar May 29 '22 02:05 sunnyHelen

Let me explain why there are problems. We first conduct experiments on KITTI dataset, where the used images come from the second camera. So, when creating the KITTI, all PX should be P2 (utilize the camera parameters from the second camera). Later, we try to do experiments on Waymo, where the utilized images are in the front view, having a number of 0. Hence, we hack the codes to generate related data with P0.

However, when I push the codes that only support KITTI, I forget to change the data-related codes to the KITTI version. So, you meet problems about KeyError: 'P0'. For KITTI, just utilize P2. :D

zhyever avatar May 29 '22 02:05 zhyever

Hi, thanks for your help. I successfully created the label after changing P0-->P2. But the error still exists when: bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

Original Traceback (most recent call last): File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem return self.dataset[idx % self._ori_len] File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem data = self.prepare_train_data(idx) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data example = self.pipeline(input_dict) File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call data = t(data) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call img=img) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all avoid_coll_boxes_2d) File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2 sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled], KeyError: 'box2d_camera'

sunnyHelen avatar May 29 '22 06:05 sunnyHelen

I will have a check from scratch ASAP and update this repo. Btw, that's the problem only for the Moca training (our downstream task on 3D detection). While the gt_sampler does not work, you can still run the SimIPU since our pre-training method does not need any gt information.

zhyever avatar May 29 '22 12:05 zhyever

Yeah, I've tried the pretraining code, which is totally ok. Thanks for your help.

sunnyHelen avatar May 29 '22 13:05 sunnyHelen

Hi @zhyever, I am running into the same error (KeyError: 'box2d_camera') for the downstream evaluation on Kitti dataset. Pretraining step does not have any issue. Let me know if there is an update. Thanks for the help!

bhavyagoyal avatar May 30 '22 19:05 bhavyagoyal

Hi, is there any new thing about solving the problem?

sunnyHelen avatar Jun 07 '22 04:06 sunnyHelen

Sorry for the late.

Download the pkl and the zipped gt_database.

Rename the pkl file to kitti_dbinfos_train.pkl and put it under your data folder. Unzip the .zip file, rename the folder to kitti_gt_database, and put it under your data folder.

The result can be like this: image

Then, run the training script again.

zhyever avatar Jun 07 '22 05:06 zhyever

Thanks a lot for your apply. It seems the data problem is solved. But there are still some problems while training.

Traceback (most recent call last): File "tools/train.py", line 222, in main() File "tools/train.py", line 218, in main meta=meta) File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/apis/train.py", line 34, in train_model meta=meta) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector meta=meta) File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run runner.run(data_loaders, cfg.workflow) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train epoch_runner(data_loaders[i], **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter self.run_iter(data_batch, train_mode=True, **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step **kwargs) File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step and self.reducer._rebuild_buckets()): RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). Parameter indices which did not receive grad for rank 0: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 ... In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this ran k as part of this error

sunnyHelen avatar Jun 07 '22 07:06 sunnyHelen

I tried to pass the keyword argument find_unused_parameters=True to `torch.nn.parallel.DistributedDataParallel. But it doesn't work.

sunnyHelen avatar Jun 07 '22 07:06 sunnyHelen

Set this flag in your config file instead of passing it by the shell.

You can add a line of find_unused_parameters=True in your config file.

zhyever avatar Jun 07 '22 08:06 zhyever

Yes. It works! Many thanks for your help.

sunnyHelen avatar Jun 07 '22 08:06 sunnyHelen

Thanks @zhyever. The funetuning on kitti3d detection is resolved now. But there seems to be an error during the evaluation (after 30 epochs). Here is the log for the error.

  File "tools/train.py", line 222, in <module>
    main()
  File "tools/train.py", line 218, in main
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch
    key_score = self.evaluate(runner, results)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 412, in evaluate
    eval_types=eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 709, in kitti_eval
    eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 613, in do_eval
    min_overlaps)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 479, in eval_class
    rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 382, in calculate_iou_partly
    dt_boxes).astype(np.float64)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 116, in bev_box_overlap
    from .rotate_iou import rotate_iou_gpu_eval
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/rotate_iou.py", line 292, in <module>
    criterion=-1):
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
    kernel.bind()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 548, in bind
    self._func.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 426, in get
    ptx = self.ptx.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 397, in get
    **self._extra_options)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 496, in llvm_to_ptx
    ptx = cu.compile(**opts)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 233, in compile
    self._try_error(err, 'Failed to compile\n')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 251, in _try_error
    self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 141, in check_error
    raise exc
numba.cuda.cudadrv.error.NvvmError: Failed to compile

<unnamed> (66, 23): parse expected comma after load's type
NVVM_ERROR_COMPILATION

bhavyagoyal avatar Jun 07 '22 18:06 bhavyagoyal

That's something related to the build of mmdet3d (in this repo, SimIPU). Refer to Issue for more information.

zhyever avatar Jun 08 '22 02:06 zhyever