ViDAR
ViDAR copied to clipboard
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
ViDAR: Visual Point Cloud Forecasting

Visual Point Cloud Forecasting enables Scalable Autonomous Driving [CVPR 2024 Highlight]
Zetong Yang, Li Chen, Yanan Sun, and Hongyang Li
- Presented by OpenDriveLab at Shanghai AI Lab
- :mailbox_with_mail: Primary contact: Zetong Yang ( [email protected] )
- arXiv paper | Video (YouTube, 5min) | Tutorial on World Model (Bilibili)
- CVPR 2024 Autonomous Deiving Challenge - Predictive World Model
Highlights
:fire: Visual point cloud forecasting, a new self-supervised pre-training task for end-to-end autonomous driving, predicting future point clouds from historical visual inputs, joint modeling the 3D geometry and temporal dynamics for simultaneous perception, prediction, and planning.
:star2: ViDAR, the first visual point cloud forecasting architecture.

:trophy: Predictive world model, in the form of visual point cloud forecasting, will be a main track in the CVPR 2024 Autonomous Driving Challenge. Please stay tuned for further details!
News
[2024/4]:fire: ViDAR-pretraining on End-to-End Autonomous Driving (UniAD) is released. Please refer to ViDAR-UniAD Page for more information.[2024/4]:fire: ViDAR-pretraining on nuScenes-fullset is released. Please check the configs for pre-training and fine-tuning. Corresponding models are available at pre-trained and fine-tuned.[2024/3]:fire: Predictive world model challenge is launched. Please refer to the link for more details.[2024/2]ViDAR code and models initially released.[2024/2]ViDAR is accepted by CVPR 2024.[2023/12]ViDAR paper released.
TODO List
Still in progress:
- [x] ViDAR-nuScenes-1/8 training and BEVFormer fine-tuning configurations.
- [x] ViDAR-OpenScene-mini training configurations. (Welcome joining predictive world model challenge!)
- [x] ViDAR-nuScenes-full training and BEVFormer full fine-tuning configurations.
- [x] UniAD fine-tuning code and configuration.
Table of Contents
- Results and Model Zoo
- Installation
- Prepare Datasets
- Train and Evaluate
- License and Citation
- Related Resources
Results and Model Zoo
Visual point cloud forecasting pre-training
NuScenes Dataset:
| Pre-train Model | Dataset | Config | CD@1s | CD@2s | CD@3s | models & logs |
|---|---|---|---|---|---|---|
| ViDAR-RN101-nus-1-8-1future | nuScenes (12.5% Data) | vidar-nusc-pretrain-1future | - | - | - | models / logs |
| ViDAR-RN101-nus-1-8-3future | nuScenes (12.5% Data) | vidar-nusc-pretrain-3future | 1.25 | 1.48 | 1.79 | models / logs |
| ViDAR-RN101-nus-full-1future | nuScenes (100% Data) | vidar-nusc-pretrain-1future | - | - | - | models |
- HINT: For running ViDAR on the nuScenes-full set, please run
python tools/merge_nusc_fullset_pkl.pybefore to generate the nuscenes_infos_temporal_traintest.pkl for pre-training.
OpenScene Dataset:
| Pre-train Model | Dataset | Config | CD@1s | CD@2s | CD@3s | models & logs |
|---|---|---|---|---|---|---|
| ViDAR-RN101-OpenScene-3future | OpenScene-mini (12.5% Data) | vidar-OpenScene-pretrain-3future-1-8 | 1.41 | 1.57 | 1.78 | models / logs |
| ViDAR-RN101-OpenScene-3future | OpenScene-mini-Full (100% Data) | vidar-OpenScene-pretrain-3future-full | 1.03 | 1.15 | 1.35 | models / logs |
Down-stream fine-tuning (Perception)
| Downstream Model | Dataset | pre-train | Config | NDS | mAP | models & logs |
|---|---|---|---|---|---|---|
| BEVFormer-Base (baseline) | nuScenes (25% Data) | FCOS3D | bevformer-base | 43.40 | 35.47 | models / logs |
| BEVFormer-Base | nuScenes (25% Data) | ViDAR-RN101-nus-1-8-1future | vidar-nusc-finetune-1future | 45.77 | 36.90 | models / logs |
| BEVFormer-Base | nuScenes (25% Data) | ViDAR-RN101-nus-1-8-3future | vidar-nusc-finetune-3future | 45.61 | 36.84 | models / logs |
| BEVFormer-Base(baseline) | nuScenes (100% Data) | FCOS3D | bevformer-base | 51.7 | 41.6 | models |
| BEVFormer-Base | nuScenes (100% Data) | ViDAR-RN101-nus-full-1future | vidar-nusc-finetune-1future | 55.33 | 45.20 | models |
Down-stream fine-tuning (End-to-End)
Please refer to ViDAR-UniAD page.
Installation
The installation step is similar to BEVFormer. For convenience, we list the steps below:
conda create -n vidar python=3.8 -y
conda activate vidar
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
conda install -c omgarcia gcc-6 # (optional) gcc-6.2
Install mm-series packages.
pip install mmcv-full==1.4.0
pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1
# Install mmdetection3d from source codes.
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
Install Detectron2 and Timm.
pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13 typing-extensions==4.5.0 pylint ipython==8.12 numpy==1.19.5 matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 scikit-image==0.19.3 setuptools==59.5.0
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
Setup ViDAR project.
git clone https://github.com/OpenDriveLab/ViDAR
cd ViDAR
mkdir pretrained
cd pretrained & wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth
# Install chamferdistance library.
cd third_lib/chamfer_dist/chamferdist/
pip install .
Prepare Datasets
Train and Evaluate
Train
We recommand using 8 A100 GPUs for training. The GPU memory usage is around 63G while pre-training.
- HINT: To save GPU memory, you can change supervise_all_future=True to False, and use a smaller vidar_head_pred_history_frame_num and
vidar_head_pred_future_frame_num.
For example, by setting
supervise_all_future=False,vidar_head_pred_history_frame_num=0,vidar_head_pred_future_frame_num=0, andvidar_head_per_frame_loss_weight=(1.0,), the GPU memory consumption of vidar-pretrain-3future-model is reduced to ~34G. An example configuration is provided at link. - Full-nuScenes-Training: To pre-train ViDAR on the full nuScenes dataset, run
python tools/merge_nusc_fullset_pkl.pybefore, to generate the nuscenes_infos_temporal_traintest.pkl for pre-training.
CONFIG=path/to/config.py
GPU_NUM=8
./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
Evaluate
CONFIG=path/to/vidar_config.py
CKPT=path/to/checkpoint.pth
GPU_NUM=8
./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM}
Visualize
CONFIG=path/to/vidar_config.py
CKPT=path/to/checkpoint.pth
GPU_NUM=1
./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \
--cfg-options 'model._viz_pcd_flag=True' 'model._viz_pcd_path=/path/to/output'
License and Citation
All assets and code are under the Apache 2.0 license unless specified otherwise.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@inproceedings{yang2023vidar,
title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
Related Resources
We acknowledge all the open-source contributors for the following projects to make this work possible: