Charles Lu
Charles Lu
You should upsample the image to (224, 224). ResNet performs 32x downsampling, and ViT also needs fixed-size input to patchify. So smaller images will cause problems here and there
Dear Authors, Thank you for your exceptional work and this wonderful dataset! I have a similar question: based on my understanding, the released 3D trajectories of key points are in...
The values in annot_dict["depth"] look similar to the data used here: https://github.com/google-research/kubric/blob/0ee21e2a723b2131123d67e55d1f65b6d0e6cf0f/challenges/point_tracking/dataset.py#L536-L540