UniAD icon indicating copy to clipboard operation
UniAD copied to clipboard

About inferencing on real video

Open BaophanN opened this issue 11 months ago • 5 comments

Dear author, I have a video recorded from camera mounted on car. How can I use UniAD model to run inference on this video?

BaophanN avatar Feb 17 '25 06:02 BaophanN

Dear @YTEP-ZHI @Yihanhu @faikit, can you kindly give us some recommendations for this? I saw a very old open issue which still had no reply. Best regards ./.

BaophanN avatar Feb 25 '25 03:02 BaophanN

@BaophanN It involves heavy engineering work. The straightforward way is to align your data with the format of the nuScenes dataset.

ilnehc avatar Feb 25 '25 04:02 ilnehc

@ilnehc Thanks for your response. As far as I know, UniAD is vision-based, however, the input of the model also requires nuscenes ego2global rotation and translation. If I have a sequence of images only, how can I obtain good prediction from the model without retraining on new data? Is there a way I can get the simliar ego2global info given only my video input. Thank you ./.

BaophanN avatar Feb 25 '25 07:02 BaophanN

@BaophanN Driving in a 3D world requires intrinsics and extrinsics at least. It is not as easy as tasks like 2D detection. You may try Structure from Motion methods to obtain them if they are not recorded.

ilnehc avatar Feb 25 '25 07:02 ilnehc

Thank you for your valuable response. May I ask this very last question? I did use Structure from motion (colmap) to get the poses that you passed to the UniAD model. However, what is the difference between the ego2global pose from nuscenes vs world2cam pose from structure from motion model?

BaophanN avatar Feb 27 '25 03:02 BaophanN