Get access to 2d bounding boxes with respect to camera
Hi, thanks for your amazing work! I have a potentially silly question to ask. I need to access the 2D bounding-boxes for each camera (front, left, right, back) but I could only find the bounding-boxes for the equirectangular image. I access the 2d boxes with:
token = "'tr1thGb4-HK8yPOzSZFHQQ-cam-front"
boxes = met.get_sample_data(token, get_all_visible_boxes=False)[2]
The result is a list of EquiBox2d. Do they refer to the projection of the bounding-boxes onto a specific camera (e.g., the frontal one)?
My second questions regards the content of the points attribute. It's a numpy list of shape (80x2). Are these all the points of the bounding box (and in this case, the four corners are the actual coordinates of the bounding box)?
Thanks in advance! :)
Hi @gianscarpe , you are already on the right track!
2D bounding boxes in Metropolis are annotated on the equirectangular images, which means that their edges map to curves, and not to straight lines, when seen from the perspective images. This is represented in the SDK by the EquiBox2d class. EquiBox2d is a discretized representation of one of these "deformed" boxes, where the points attribute contains its boundary expressed as a polygon. If you want to get a standard, axis-aligned box out of this, the easiest way would be to compute the bounding box of the points.
Note: this approach will give you bounding boxes that are not tight around the objects. Unfortunately, there's no way to obtain tight boxes on the perspective images given tight boxes on the equirectangular images without human re-annotation.
Hi @ducksoup, thanks for your answer, it solved my problem! I have a couple of questions more. I noticed that the SDK provides the 2d bounding boxes only for a handful of classes, while the large majority of the categories are missing (e.g., buildings, vegetation, sidewalks). Is it intended to be? Another question regards the depth. I noticed that some objects (e.g., cars and pedestrians) are missing from the lidar data, I guess because of SFM estimation. Is it correct? I attach a couple of examples, thank you for your time! :)

To answer your questions:
- Bounding boxes are provided only for "things", i.e. countable objects. These are the categories that were annotated by human annotators. Everything else ("stuff" classes, panoptic segmentations) are machine generated and do not include bounding boxes.
- The depth images are generated using SfM, so they only capture the static, consistently 3d-reconstructable part of the scene. Note that annotated moving objects such as cars and pedestrians are explicitly excluded from the reconstruction process.