UniAD icon indicating copy to clipboard operation
UniAD copied to clipboard

FP16 training issue

Open ianz27 opened this issue 2 years ago • 0 comments

it seems NOT support FP16 training now, any planning? or how to fix it ? 2023-09-08 14:40:34,180 - mmdet - INFO - workflow: [('train', 1)], max: 6 epochs 2023-09-08 14:40:34,182 - mmdet - INFO - Checkpoints will be saved to /local/zq/work/ad/UniAD/projects/work_dirs/stage1_track_map/base_track_map_3090 by HardDiskBackend. Traceback (most recent call last): File "tools/train.py", line 256, in <module> main() File "tools/train.py", line 245, in main custom_train_model( File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/apis/train.py", line 21, in custom_train_model custom_train_detector( File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/apis/mmdet_train.py", line 194, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run epoch_runner(data_loaders[i], **kwargs) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 63, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step losses = self(**data) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/detectors/uniad_e2e.py", line 81, in forward return self.forward_train(**kwargs) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 146, in new_func output = old_func(*new_args, **new_kwargs) File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/detectors/uniad_e2e.py", line 163, in forward_train losses_track, outs_track = self.forward_track_train(img, gt_bboxes_3d, gt_labels_3d, gt_past_traj, gt_past_traj_mask, gt_inds, gt_sdc_bbox, gt_sdc_label, File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 146, in new_func output = old_func(*new_args, **new_kwargs) File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/detectors/uniad_track.py", line 604, in forward_track_train frame_res = self._forward_single_frame_train( File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 146, in new_func output = old_func(*new_args, **new_kwargs) File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/detectors/uniad_track.py", line 516, in _forward_single_frame_train out_track_instances = self.query_interact(tmp) File "/home/zq/anaconda3/envs/e2e/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/dense_heads/track_head_plugin/modules.py", line 252, in forward merged_track_instances = Instances.cat( File "/local/zq/work/ad/UniAD/projects/mmdet3d_plugin/uniad/dense_heads/track_head_plugin/track_instance.py", line 180, in cat values = torch.cat(values, dim=0) RuntimeError: expected scalar type Float but found Half

ianz27 avatar Sep 08 '23 06:09 ianz27