The transformation matrix between cameras in TAPVid-3D
Dear Authors,
Thank you so much for your great work! It's really a great contribution to the filed.
I am writing to ask is it possible for you to provide the transformation matrix (delta pose) of the cameras of the DriveTrack and ADT splits? With these transformation matrix, we can decouple camera motion and object motion, which would benefit greatly for the 3D vision field.
Best, Runsen
Hey Runsen,
Thanks for reaching out and the kind words.
Yup -- the camera extrinsics matrices is something we were thinking of releasing as well, which I think we already have for both those splits, buried in the scripts. As you probably noted, for Panoptic, the camera is fixed.
It's very helpful to know that there is demand for this, so thanks for writing in.
I can't promise a timeline for releasing this updated version with extrinsics unfortunately -- as the team is super busy with the ICLR and other deadlines right now, but it's something that we ourselves would find useful, so added high to our to-do list.
I'll leave this issue open, until we get around to uploading those as well.
Hi Skanda,
Thanks for your reply! Really look forward to that!
Best, Runsen
Dear Authors,
Thank you for your exceptional work and this wonderful dataset!
I have a similar question: based on my understanding, the released 3D trajectories of key points are in the camera coordinate space. I was wondering if it is possible to release the camera extrinsic parameters or 3D trajectories in world coordinates. Any updates on this matter would be greatly appreciated.
Thank you once again for your valuable time!
Dear Authors,
Me too. Still waiting for the release of camera extrinsic. Hope I am able to use it for my CVPR 2025 project. :)
Thanks again for your wonderful work.
Best, Runsen
Dear authors,
Really look forward to the updates here! Your work is amazing, that's why I am still waiting for your release. :) Thank you again.
Best, Runsen
Same issue here. Thank you for this great work! It would be very helpful if you could provide camera poses or the points 3D trajectory in the world coordinates.
After checking the data structure, I find the extrinsics of ADT can be obtained by using the gt_provider.get_aria_3d_pose_by_timestamp_ns method in the script. However, it is still impossible to get the the extrinsics of the drivetrack split. I understand that releasing the ground truth extrinsics might involve significant effort, but would it be possible to at least provide a mapping between the *.npz filenames and the original scene IDs (and timestamps) in the Waymo Open Dataset? This would be sufficient for us to retrieve the ground truth poses on our end. Thanks.
Hi all, thanks for your patience on this. Extrinsics for ADT and DriveTrack are planned to be released soon, probably next week. I believe the filenames of the DriveTrack npz should match those from the original DriveTrack data, but Skanda can say more about this.
Le ven. 22 nov. 2024, 14:14, Tianyuan Yuan @.***> a écrit :
After checking the data structure, I find the extrinsics of ADT can be obtained by using the gt_provider.get_aria_3d_pose_by_timestamp_ns method in the script. However, it is still impossible to get the the extrinsics of the drivetrack split. I understand that releasing the ground truth extrinsics might involve significant effort, but would it be possible to at least provide a mapping between the *.npz filenames and the original scene IDs (and timestamps) in the Waymo Open Dataset? This would be sufficient for us to retrieve the ground truth poses on our end. Thanks.
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2493862397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3FH7FYYDON44VA3XJD2B435VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJTHA3DEMZZG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks for your reply! That would be great. :)
Hi Runsen and Tianyuan,
Hope all is well! Huge thanks for your patience! We've just released TAPVid-3D with extrinsics (for the videos with moving camera -- Waymo Open videos and ADT videos). 🎉
You should be able to see a new extrinsics_w2c matrix in the *.npz files in the rc5 release version (e.g. https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/) and a visualization of the extrinsics is also available at the bottom of the demo Colab (https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing)
I'll close this for now, but feel free to re-open if you have questions.
Hi, thanks for releasing the extrinsics! Does it mean that I need to re-run the download command for each dataset to get the latest *.npz files with extrinsics? Btw the link https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/ reports NoSuchKeyThe specified key does not exist.No such object: dm-tapnet/tapvid3d/release_files/rc5/.
Hi, yes -- you will need to redownload, as the extrinsics are packaged into a new set of *.npz that contain everything (video + annotation).
Apologies about that link -- have confirmed the files are working and downloadable. Directly linking to the directory doesn't work, only linking files works that way (e.g. https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/rc5/drivetrack/tapvid3d_10096619443888687526_2820_000_2840_000_2_ZOPKTfO1L4TG1PIZ8DIdEA.npz should work!).
I'll close this for now, but feel free to re-open if you have questions.
Hi,
Thank you so so so much for providing the extrinsics.
I’ve been working with tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz (the demo used this sample) and noticed what seems to be a significant misalignment when transforming the same 3D track across consecutive frames. Specifically, I use the per-frame extrinsic (extrinsics_w2c[i]) to transform tracks_XYZ[i] from camera coordinates to world coordinates and compare the resulting positions across frames for the same track. As shown in the code snippet below, the same 3D point appears to shift about 0.5 meters between neighboring frames while they shouldn't because we can see these points remain static.
tracks_xyz = self.npz_file['tracks_XYZ'] # shape: (num_frames, num_tracks, 3)
extrinsics_w2c = self.npz_file['extrinsics_w2c'] # shape: (num_frames, 4, 4)
num_frames, num_tracks, _ = tracks_xyz.shape
all_world_points = np.zeros((num_frames, num_tracks, 3), dtype=np.float32)
for i in range(num_frames):
scene_points = tracks_xyz[i]
w2c = extrinsics_w2c[i]
c2w = np.linalg.inv(w2c)
ones = np.ones((num_tracks, 1))
points_h = np.concatenate([scene_points, ones], axis=-1)
points_world_h = points_h @ c2w.T
points_world = points_world_h[:, :3] / points_world_h[:, 3:]
all_world_points[i] = points_world
# For each track across frames, I check positions in world coords:
for t in range(num_tracks):
track_positions = all_world_points[:, t, :]
diffs = np.linalg.norm(track_positions[1:] - track_positions[0:-1], axis=-1)
print(diffs)
# when t == 0:
diffs
array([0.53178436, 0.51016134, 0.48404232, 0.5151894 , 0.5285841 ,
0.5186206 , 0.4924652 , 0.48030394, 0.49436355, 0.5212535 ,
0.5359043 , 0.5205395 , 0.5047182 , 0.50314695, 0.50608414,
0.4934266 , 0.4732564 , 0.4676542 , 0.46894327, 0.4698973 ,
0.4684759 , 0.46728835, 0.4689295 , 0.46779054], dtype=float32)
# when t == 1
diffs
array([0.53238755, 0.5103867 , 0.4840829 , 0.5160158 , 0.52974313,
0.51969886, 0.49327874, 0.48056367, 0.49499694, 0.52228683,
0.53710014, 0.52166265, 0.5053245 , 0.50366944, 0.50665784,
0.4936775 , 0.47286656, 0.4672455 , 0.468741 , 0.46957228,
0.4680983 , 0.46692365, 0.46911004, 0.468038 ], dtype=float32)
# when t == -1
diffs
array([0.5302407 , 0.51038945, 0.48535085, 0.5143417 , 0.5266145 ,
0.5173515 , 0.49292427, 0.4814093 , 0.49443805, 0.51919323,
0.53252304, 0.51832366, 0.503406 , 0.50175214, 0.50455767,
0.49279687, 0.4748173 , 0.4694223 , 0.47015733, 0.47095898,
0.4693982 , 0.46798575, 0.46855757, 0.46721664], dtype=float32)
I can understand there may be some estimation error for camera extrinsics, but 0.5m per frame seems too big to be just minor noise. Could there be an issue with the provided extrinsics?
I have checked several other samples in DriveTrack, and they also have such errors like tapvid3d_574762194520856849_1660_000_1680_000_1_p0zQEBrZsA0eJvmQAWy7CQ, which can have up to 1.2m error.
Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.
I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.
Hi Runsen,
Could this difference be due to the existence of dynamic objects? These objects will naturally have varying 3D positions in world coordinates.
Best, Ignacio
Le mar. 18 févr. 2025 à 00:16, Skanda Koppula @.***> a écrit :
Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.
I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3GSKTOZ67SXR3CPOIT2QJUWPAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUGIYTGOJTGE . You are receiving this because you commented.Message ID: @.***> [image: skoppula]skoppula left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931
Hi Runsen! Good catch, thanks for providing the sample code, adding looking into this to our to-do queue.
I'll re-open the issue. It's super hectic at the moment with some ongoing work, so I can't promise a very quick turnaround, but hopefully one of us will be able to help you out shortly.
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664213931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3GSKTOZ67SXR3CPOIT2QJUWPAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUGIYTGOJTGE . You are receiving this because you commented.Message ID: @.***>
Hi Ignacio,
I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.
Thank you so much for your reply!
Best, Runsen
Hi Runsen,
I verified Apartment_release_meal_seq138_4 and all looks correct to me.
Best, Ignacio
Le mar. 18 févr. 2025 à 08:33, Runsen Xu @.***> a écrit :
Hi Ignacio,
I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.
Thank you so much for your reply!
Best, Runsen
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3AMY4LHQQ53XT26VY32QLO3JAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHAZDENZSGA . You are receiving this because you commented.Message ID: @.***> [image: RunsenXu]RunsenXu left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720
Hi Ignacio,
I do not think so. I have visualize the points with video and the points are static. For this one tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz, it's used by your demo https://colab.research.google.com/drive/1Ro2sE0lAvq-h0lixrUBB0oTYXEwXNr66?usp=sharing and you can see that all the points are static.
Thank you so much for your reply!
Best, Runsen
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2664822720, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3AMY4LHQQ53XT26VY32QLO3JAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRUHAZDENZSGA . You are receiving this because you commented.Message ID: @.***>
Hi Ignacio,
I think extrinsics in the ADT split are correct, and the problem lies at the split of DriveTrack. Could you check the example I mentioned?
Best, Runsen
Hi Ignacio,
Do you have the similar finding with me?
Thank you very much for your attention to this!
Best, Runsen
Hi Runsen,
I've updated the files in the v1.0 release. Could you check if the new file looks ok?
https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/v1.0/drivetrack/tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz
Best, Ignacio
Le mar. 25 févr. 2025 à 04:06, Runsen Xu @.***> a écrit :
Hi Ignacio,
Do you have the similar finding with me?
Thank you very much for your attention to this!
Best, Runsen
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3EHSSIQMP4QQYBWPML2RPT3VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGM4DIOJXHE . You are receiving this because you commented.Message ID: @.***> [image: RunsenXu]RunsenXu left a comment (google-deepmind/tapnet#115) https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979
Hi Ignacio,
Do you have the similar finding with me?
Thank you very much for your attention to this!
Best, Runsen
— Reply to this email directly, view it on GitHub https://github.com/google-deepmind/tapnet/issues/115#issuecomment-2680384979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC3LX3EHSSIQMP4QQYBWPML2RPT3VAVCNFSM6AAAAABMS3TZJGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGM4DIOJXHE . You are receiving this because you commented.Message ID: @.***>
Hi Ignacio,
I have checked the file https://storage.googleapis.com/dm-tapnet/tapvid3d/release_files/v1.0/drivetrack/tapvid3d_9142545919543484617_86_000_106_000_2_5AKc-TYQochsSWXpv376cA.npz .
This time the extrinsics seems much more normal than before. At this time, the average camera position error between each frame is about 0.015m, compared with previous 0.47m.
Very glad to see the problem is being fixed, and thank you for your effort.
But still, there is still about 0.015m error between each frame, do you think there is still something wrong?
For visualization, below are the points trajectry in world coordinates. Because these points are from a static car, they are supposed to be static.
https://github.com/user-attachments/assets/6b39b069-1dbd-46af-823b-2d2e70c13bf7
Best, Runsen
somewhat related to this, in data processing script for adt the 'extrinsics_w1c' field is not transfered from the input .npz files to the output npzs
it should be done here I guess?
Are the annotations given in the input npzs accurate/reliable or still a WIP?