tapnet The transformation matrix between cameras in TAPVid-3D

Dear Authors,

Thank you so much for your great work! It's really a great contribution to the filed.

I am writing to ask is it possible for you to provide the transformation matrix (delta pose) of the cameras of the DriveTrack and ADT splits? With these transformation matrix, we can decouple camera motion and object motion, which would benefit greatly for the 3D vision field.

Best, Runsen

Aug 15 '24 20:08 RunsenXu

Hey Runsen,

Thanks for reaching out and the kind words.

Yup -- the camera extrinsics matrices is something we were thinking of releasing as well, which I think we already have for both those splits, buried in the scripts. As you probably noted, for Panoptic, the camera is fixed.

It's very helpful to know that there is demand for this, so thanks for writing in.

I can't promise a timeline for releasing this updated version with extrinsics unfortunately -- as the team is super busy with the ICLR and other deadlines right now, but it's something that we ourselves would find useful, so added high to our to-do list.

Aug 20 '24 22:08 skoppula

I'll leave this issue open, until we get around to uploading those as well.

Aug 20 '24 22:08 skoppula

Hi Skanda,

Thanks for your reply! Really look forward to that!

Best, Runsen

Aug 20 '24 22:08 RunsenXu

Dear Authors,

Thank you for your exceptional work and this wonderful dataset!

I have a similar question: based on my understanding, the released 3D trajectories of key points are in the camera coordinate space. I was wondering if it is possible to release the camera extrinsic parameters or 3D trajectories in world coordinates. Any updates on this matter would be greatly appreciated.

Thank you once again for your valuable time!

Oct 03 '24 03:10 Charles-Lu

Dear Authors,

Me too. Still waiting for the release of camera extrinsic. Hope I am able to use it for my CVPR 2025 project. :)

Thanks again for your wonderful work.

Best, Runsen

Oct 03 '24 17:10 RunsenXu

Dear authors,

Really look forward to the updates here! Your work is amazing, that's why I am still waiting for your release. :) Thank you again.

Best, Runsen

Nov 21 '24 18:11 RunsenXu

Same issue here. Thank you for this great work! It would be very helpful if you could provide camera poses or the points 3D trajectory in the world coordinates.

Nov 22 '24 12:11 yuantianyuan01

After checking the data structure, I find the extrinsics of ADT can be obtained by using the gt_provider.get_aria_3d_pose_by_timestamp_ns method in the script. However, it is still impossible to get the the extrinsics of the drivetrack split. I understand that releasing the ground truth extrinsics might involve significant effort, but would it be possible to at least provide a mapping between the *.npz filenames and the original scene IDs (and timestamps) in the Waymo Open Dataset? This would be sufficient for us to retrieve the ground truth poses on our end. Thanks.

Nov 22 '24 14:11 yuantianyuan01

Hi all, thanks for your patience on this. Extrinsics for ADT and DriveTrack are planned to be released soon, probably next week. I believe the filenames of the DriveTrack npz should match those from the original DriveTrack data, but Skanda can say more about this.

Le ven. 22 nov. 2024, 14:14, Tianyuan Yuan @.***> a écrit :

After checking the data structure, I find the extrinsics of ADT can be obtained by using the gt_provider.get_aria_3d_pose_by_timestamp_ns method in the script. However, it is still impossible to get the the extrinsics of the drivetrack split. I understand that releasing the ground truth extrinsics might involve significant effort, but would it be possible to at least provide a mapping between the *.npz filenames and the original scene IDs (and timestamps) in the Waymo Open Dataset? This would be sufficient for us to retrieve the ground truth poses on our end. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2493862397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3FH7FYYDON44VA3XJD2B435VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTHA3DEMZZG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Nov 22 '24 16:11 ignacio-rocco

Thanks for your reply! That would be great. :)

Nov 23 '24 06:11 yuantianyuan01

Hi Runsen and Tianyuan,

Hope all is well! Huge thanks for your patience! We've just released TAPVid-3D with extrinsics (for the videos with moving camera -- Waymo Open videos and ADT videos). 🎉

You should be able to see a new extrinsics_w2c matrix in the *.npz files in the rc5 release version (e.g. https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/) and a visualization of the extrinsics is also available at the bottom of the demo Colab (https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing)

Dec 05 '24 02:12 skoppula

I'll close this for now, but feel free to re-open if you have questions.

Dec 05 '24 02:12 skoppula

Hi, thanks for releasing the extrinsics! Does it mean that I need to re-run the download command for each dataset to get the latest *.npz files with extrinsics? Btw the link https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/ reports NoSuchKeyThe specified key does not exist.No such object: dm-tapnet/tapvid3d/release_files/rc5/.

Dec 10 '24 09:12 yuantianyuan01

Hi, yes -- you will need to redownload, as the extrinsics are packaged into a new set of *.npz that contain everything (video + annotation).

Apologies about that link -- have confirmed the files are working and downloadable. Directly linking to the directory doesn't work, only linking files works that way (e.g. https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/drivetrack/tapvid3d_10096619443888687526_2820_000_2840_000_2_ZOPKTfO1L4TG1PIZ8DIdEA.npz should work!).

Dec 10 '24 11:12 skoppula

I'll close this for now, but feel free to re-open if you have questions.

Dec 10 '24 11:12 skoppula

Hi,

Thank you so so so much for providing the extrinsics.

I’ve been working with tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz (the demo used this sample) and noticed what seems to be a significant misalignment when transforming the same 3D track across consecutive frames. Specifically, I use the per-frame extrinsic (extrinsics_w2c[i]) to transform tracks_XYZ[i] from camera coordinates to world coordinates and compare the resulting positions across frames for the same track. As shown in the code snippet below, the same 3D point appears to shift about 0.5 meters between neighboring frames while they shouldn't because we can see these points remain static.

tracks_xyz = self.npz_file['tracks_XYZ']  # shape: (num_frames, num_tracks, 3)
extrinsics_w2c = self.npz_file['extrinsics_w2c']  # shape: (num_frames, 4, 4)
num_frames, num_tracks, _ = tracks_xyz.shape

all_world_points = np.zeros((num_frames, num_tracks, 3), dtype=np.float32)

for i in range(num_frames):
    scene_points = tracks_xyz[i]
    w2c = extrinsics_w2c[i]
    c2w = np.linalg.inv(w2c)
    
    ones = np.ones((num_tracks, 1))
    points_h = np.concatenate([scene_points, ones], axis=-1)
    
    points_world_h = points_h @ c2w.T
    points_world = points_world_h[:, :3] / points_world_h[:, 3:]
    all_world_points[i] = points_world

# For each track across frames, I check positions in world coords:
for t in range(num_tracks):
    track_positions = all_world_points[:, t, :]
    diffs = np.linalg.norm(track_positions[1:] - track_positions[0:-1], axis=-1)
    print(diffs)

# when t == 0:

diffs
array([0.53178436, 0.51016134, 0.48404232, 0.5151894 , 0.5285841 ,
       0.5186206 , 0.4924652 , 0.48030394, 0.49436355, 0.5212535 ,
       0.5359043 , 0.5205395 , 0.5047182 , 0.50314695, 0.50608414,
       0.4934266 , 0.4732564 , 0.4676542 , 0.46894327, 0.4698973 ,
       0.4684759 , 0.46728835, 0.4689295 , 0.46779054], dtype=float32)

# when t == 1
diffs
array([0.53238755, 0.5103867 , 0.4840829 , 0.5160158 , 0.52974313,
       0.51969886, 0.49327874, 0.48056367, 0.49499694, 0.52228683,
       0.53710014, 0.52166265, 0.5053245 , 0.50366944, 0.50665784,
       0.4936775 , 0.47286656, 0.4672455 , 0.468741  , 0.46957228,
       0.4680983 , 0.46692365, 0.46911004, 0.468038  ], dtype=float32)

# when t == -1
diffs
array([0.5302407 , 0.51038945, 0.48535085, 0.5143417 , 0.5266145 ,
       0.5173515 , 0.49292427, 0.4814093 , 0.49443805, 0.51919323,
       0.53252304, 0.51832366, 0.503406  , 0.50175214, 0.50455767,
       0.49279687, 0.4748173 , 0.4694223 , 0.47015733, 0.47095898,
       0.4693982 , 0.46798575, 0.46855757, 0.46721664], dtype=float32)

I can understand there may be some estimation error for camera extrinsics, but 0.5m per frame seems too big to be just minor noise. Could there be an issue with the provided extrinsics?

I have checked several other samples in DriveTrack, and they also have such errors like tapvid3d_574762194520856849_1660_000_1680_000_1_p0zQEBrZsA0eJvmQAWy7CQ, which can have up to 1.2m error.

Feb 16 '25 13:02 RunsenXu

Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.

I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.

Feb 17 '25 23:02 skoppula

Hi Runsen,

Could this difference be due to the existence of dynamic objects? These objects will naturally have varying 3D positions in world coordinates.

Best, Ignacio

Le mar. 18 févr. 2025 à 00:16, Skanda Koppula @.***> a écrit :

Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.

I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3GSKTOZ67SXR3CPOIT2QJUWPAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUGIYTGOJTGE . You are receiving this because you commented.Message ID: @.***> [image: skoppula]skoppula left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931

Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.

I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3GSKTOZ67SXR3CPOIT2QJUWPAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUGIYTGOJTGE . You are receiving this because you commented.Message ID: @.***>

Feb 18 '25 07:02 ignacio-rocco

Hi Ignacio,

I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.

Thank you so much for your reply!

Best, Runsen

Feb 18 '25 07:02 RunsenXu

Hi Runsen,

I verified Apartment_release_meal_seq138_4 and all looks correct to me.

Best, Ignacio

Le mar. 18 févr. 2025 à 08:33, Runsen Xu @.***> a écrit :

Hi Ignacio,

I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.

Thank you so much for your reply!

Best, Runsen

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3AMY4LHQQ53XT26VY32QLO3JAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHAZDENZSGA . You are receiving this because you commented.Message ID: @.***> [image: RunsenXu]RunsenXu left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720

Hi Ignacio,

I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.

Thank you so much for your reply!

Best, Runsen

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3AMY4LHQQ53XT26VY32QLO3JAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHAZDENZSGA . You are receiving this because you commented.Message ID: @.***>

Feb 18 '25 17:02 ignacio-rocco

Hi Ignacio,

I think extrinsics in the ADT split are correct, and the problem lies at the split of DriveTrack. Could you check the example I mentioned?

Best, Runsen

Feb 19 '25 02:02 RunsenXu

Hi Ignacio,

Do you have the similar finding with me?

Thank you very much for your attention to this!

Best, Runsen

Feb 25 '25 04:02 RunsenXu

Hi Runsen,

I've updated the files in the v1.0 release. Could you check if the new file looks ok?

https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/v1.0/drivetrack/tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz

Best, Ignacio

Le mar. 25 févr. 2025 à 04:06, Runsen Xu @.***> a écrit :

Hi Ignacio,

Do you have the similar finding with me?

Thank you very much for your attention to this!

Best, Runsen

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3EHSSIQMP4QQYBWPML2RPT3VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGM4DIOJXHE . You are receiving this because you commented.Message ID: @.***> [image: RunsenXu]RunsenXu left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979

Hi Ignacio,

Do you have the similar finding with me?

Thank you very much for your attention to this!

Best, Runsen

— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3EHSSIQMP4QQYBWPML2RPT3VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGM4DIOJXHE . You are receiving this because you commented.Message ID: @.***>

Feb 25 '25 09:02 ignacio-rocco

Hi Ignacio,

I have checked the file https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/v1.0/drivetrack/tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz .

This time the extrinsics seems much more normal than before. At this time, the average camera position error between each frame is about 0.015m, compared with previous 0.47m.

Very glad to see the problem is being fixed, and thank you for your effort.

But still, there is still about 0.015m error between each frame, do you think there is still something wrong?

For visualization, below are the points trajectry in world coordinates. Because these points are from a static car, they are supposed to be static.

https://github.com/user-attachments/assets/6b39b069-1dbd-46af-823b-2d2e70c13bf7

Best, Runsen

Feb 26 '25 10:02 RunsenXu

somewhat related to this, in data processing script for adt the 'extrinsics_w1c' field is not transfered from the input .npz files to the output npzs

it should be done here I guess?

Are the annotations given in the input npzs accurate/reliable or still a WIP?

Mar 05 '25 14:03 Ale-Burzio