Open3D-ML icon indicating copy to clipboard operation
Open3D-ML copied to clipboard

PointPillar Kitti training issue

Open jsohn123 opened this issue 4 years ago • 0 comments

My KITTI data folder is listed in 3 categories as per script: data_object_calib data_object_label_2 data_object_velodyne


import os
import sys
import open3d.ml as _ml3d
import open3d.ml.tf  as ml3d


cfg_file = "/content/Desktop/point_pillars/Open3D-ML/ml3d/configs/pointpillars_kitti.yml"
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

model = ml3d.models.PointPillars(**cfg.model) #ml3d.models.PointPillars(**cfg.model)

datapath =  "/content/Desktop/KITTI_DATASET/KITTI_PTCLOUD_DATA/data_object_velodyne"

dataset = ml3d.datasets.KITTI(dataset_path=datapath, use_cache = True)

pipeline = ml3d.pipelines.ObjectDetection(model=model, dataset = dataset, **cfg.pipeline)

pipeline.run_train()

With KITTI yml file straight from master:

dataset:
  name: KITTI
  dataset_path: # path/to/your/dataset
  cache_dir: ./logs/cache
  steps_per_epoch_train: 5000

model:
  name: PointPillars
  ckpt_path: # path/to/your/checkpoint

  batcher: "ignore"

  point_cloud_range: [0, -39.68, -3, 69.12, 39.68, 1]
  classes: ['Pedestrian', 'Cyclist', 'Car']

  loss:
    focal:
      gamma: 2.0
      alpha: 0.25
      loss_weight: 1.0
    smooth_l1:
      beta: 0.11
      loss_weight: 2.0
    cross_entropy:
      loss_weight: 0.2

  voxelize:
    max_num_points: 32
    voxel_size: &vsize
      [0.16, 0.16, 4]
    max_voxels: [16000, 40000]

  voxel_encoder:
    in_channels: 4
    feat_channels: [64]
    voxel_size: *vsize

  scatter:
    in_channels: 64
    output_shape: [496, 432]

  backbone:
    in_channels: 64
    out_channels: [64, 128, 256]
    layer_nums: [3, 5, 5]
    layer_strides: [2, 2, 2]

  neck:
    in_channels: [64, 128, 256]
    out_channels: [128, 128, 128]
    upsample_strides: [1, 2, 4]
    use_conv_for_no_stride: false

  head:
    in_channels: 384
    feat_channels: 384
    nms_pre: 100
    score_thr: 0.1
    ranges: [
      [0, -39.68, -0.6, 70.4, 39.68, -0.6],
      [0, -39.68, -0.6, 70.4, 39.68, -0.6],
      [0, -39.68, -1.78, 70.4, 39.68, -1.78]
    ]
    sizes: [[0.6, 0.8, 1.73], [0.6, 1.76, 1.73], [1.6, 3.9, 1.56]]
    rotations: [0, 1.57]
    iou_thr: [[0.35, 0.5], [0.35, 0.5], [0.45, 0.6]]

  augment:
    PointShuffle: True
    ObjectRangeFilter:
      point_cloud_range: [0, -39.68, -3, 69.12, 39.68, 1]
    ObjectSample:
      min_points_dict:
        Car: 5
        Pedestrian: 10
        Cyclist: 10
      sample_dict:
        Car: 15
        Pedestrian: 10
        Cyclist: 10


pipeline:
  name: ObjectDetection
  test_compute_metric: true
  batch_size: 6
  val_batch_size: 1
  test_batch_size: 1
  save_ckpt_freq: 5
  max_epoch: 200
  main_log_dir: ./logs
  train_sum_dir: train_log
  grad_clip_norm: 2

  optimizer:
    lr: 0.001
    betas: [0.95, 0.99]
    weight_decay: 0.01

  # evaluation properties
  overlaps: [0.5, 0.5, 0.7]
  similar_classes: {
    Van: Car,
    Person_sitting: Pedestrian
  }
  difficulties: [0, 1, 2]
  summary:
    record_for: []
    max_pts:
    use_reference: false
    max_outputs: 1


Currently errors out in training (found the ptclouds, but fails past 8 %):

INFO - 2022-01-10 17:24:29,676 - object_detection - Logging in file : ./logs/PointPillars_KITTI_tf/log_train_2022-01-10_17:24:29.txt INFO - 2022-01-10 17:24:29,676 - kitti - Found 3712 pointclouds for training INFO - 2022-01-10 17:24:29,811 - object_detection - Restored from ./logs/PointPillars_KITTI_tf/checkpoint/ckpt-30 INFO - 2022-01-10 17:24:29,812 - object_detection - Writing summary in train_log/00013_PointPillars_KITTI_tf. INFO - 2022-01-10 17:24:29,812 - object_detection - Started training INFO - 2022-01-10 17:24:29,812 - object_detection - === EPOCH 146/200 === training: 0%| | 0/619 [00:00<?, ?it/s]2022-01-10 17:24:29.819007: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2022-01-10 17:24:29.836870: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2400000000 Hz 2022-01-10 17:24:30.767434: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 266141696 exceeds 10% of free system memory. 2022-01-10 17:24:35.794663: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them. training - loss_cls: 0.843 loss_bbox: 0.872 loss_dir: 0.172 > loss: 1.888: 8%| | 49/619 [14:37<2:42022-01-10 17:39:07.528121: W tensorflow/core/framework/op_kernel.cc:1755] Unknown: InvalidArgumentError: ConcatOp : Ranks of all input tensors should match: shape[0] = [2,7] vs. shape[4] = [0] [Op:ConcatV2] name: concat

The invalid argument seems to be coming from:

File "/content/Desktop/point_pillars/env/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 961, in generator_py_func values = next(generator_state.get_iterator(iterator_id))

File "/content/Desktop/point_pillars/Open3D-ML/ml3d/tf/models/point_pillars.py", line 341, in batcher bboxes = tf.concat([

GIven that I'm running straight master with no modification + using given KITTI data + default kitti.yaml script, some help is greatly appreciated.

Thanks!

jsohn123 avatar Jan 11 '22 01:01 jsohn123