Manuel Alejandro Díaz Zapata comments

Results 20 comments of


                                            Manuel Alejandro Díaz Zapata

Will this model support road segmentation?

I think maybe other network architecture would suit better road segmentation tasks. [This Pytorch repository](https://github.com/meetshah1995/pytorch-semseg) has some interesting code, I've worked with it and it is really intuitive. [This is...

Ncams=5

Maybe to add robustness to the system to not depend on all cameras for each segment of the final grid? That way you can also see how much the other...

Ncams=5

Hello @VeeranjaneyuluToka The idea here is that by using the intrinsics, extrinsics, predicted categorical depth and sum pooling, you could theoretically do the projection of features coming from any number...

Ncams=5

First, let me start from what I (and the authors, I think) mean with categorical depth. Imagine that you would like to predict what objects are in front of your...

About the Splat module

> > > > Hi man, have you understood the codes for projecting 2D images to 3D BEV, because I have same questions. How to make frustum shape reconstruction rather...

why “get_binimg” Donot crop and resize

Because this is the BEV representation, not a camera image. So it represents the space around the vehicle

Details about the code in model.py

The `get_geometry` function uses the intrinsic and extrinsic matrices of the camera together with the different defined depths. This way we can see where each pixel is looking to using...

Details about the code in model.py

Take a look at this [paper](https://arxiv.org/pdf/1903.08960.pdf), where in equation 2 they describe how the projection is done from camera to 3D. In their case, they get the depth from a...

Depth distribution and context vector of a pixel

Context vector are the features. Both of them are generated in the last layer of the Camera encoder. Here you can see the context vector, called [`depth`](https://github.com/nv-tlabs/lift-splat-shoot/blob/master/src/models.py#L56) and the features...

Depth distribution and context vector of a pixel

> Extract features from each input image sample by considering the last 2 blocks of efficientnet (reduction_5 and reduction_4), lets say they are x1 and x2. Yes, that is what...