Unable to reproduce claimed DAVIS performance with a public checkpoint.
Hello,
I am trying to reproduce the numbers reported in the CoTracker3 paper using the public checkpoint (scaled offline model ). Unfortunately, I am not able to obtain the claimed performance of 64.4 AJ (on DAVIS First) reported in Table 1 of https://arxiv.org/pdf/2410.11831.
Here is the code I was using to get mertrics, I was running it on a fresh conda environment with python=3.10, torch 2.5.1, and cuda 12.4. I copied the code from the jupyter cell.
!git clone https://github.com/facebookresearch/co-tracker
%cd co-tracker
!pip install -e .
!pip install opencv-python matplotlib moviepy flow_vis
!mkdir cotracker_checkpoints
%cd co-tracker
%cd cotracker_checkpoints
!wget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_offline.pth
%cd ../..
!wget https://storage.googleapis.com/dm-tapnet/tapvid_davis.zip
!unzip tapvid_davis.zip
# install this to properly import tapnet
!pip install jax chex einshape dm-haiku optax tensorflow-cpu mediapy
!pip install tensorflow_datasets
!git clone https://github.com/google-deepmind/tapnet
import os
import torch
import sys
sys.path.insert(0, 'co-tracker')
sys.path.insert(0, 'tapnet')
!wget https://raw.githubusercontent.com/google-deepmind/tapnet/refs/heads/main/tapnet/tapvid/evaluation_datasets.py
from evaluation_datasets import create_davis_dataset, compute_tapvid_metrics
from collections import defaultdict
from tqdm import tqdm
import numpy as np
davis = create_davis_dataset(
'tapvid_davis/tapvid_davis.pkl',
query_mode='first'
)
davis_data = [p for p in davis]
from cotracker.predictor import CoTrackerPredictor
model = CoTrackerPredictor(
checkpoint=os.path.join(
'./co-tracker/cotracker_checkpoints/scaled_offline.pth'
),
offline=True,
window_len=60 # the checkpoint seems to require this window length
)
model = model.cuda()
model.support_grid_size = 5
all_metrics = defaultdict(list)
for j in tqdm(range(len(davis_data))):
video = davis_data[j]['davis']['video']
query_points = davis_data[j]['davis']['query_points']
video = torch.from_numpy(video).permute(0, 1, 4, 2, 3).cuda().float()
query_points = torch.from_numpy(query_points).cuda().float()
query_points = torch.stack([query_points[..., 0], query_points[..., 2], query_points[..., 1]], axis=2)
pred_tracks = []
pred_vis = []
pred_tracks, pred_visibility = model(video, queries=query_points, )
metrics = compute_tapvid_metrics(
davis_data[j]['davis']['query_points'],
davis_data[j]['davis']['occluded'],
davis_data[j]['davis']['target_points'],
(~pred_visibility).transpose(1,2).cpu().numpy(),
pred_tracks.cpu().transpose(1, 2).numpy(),
query_mode='first'
)
for k in metrics:
all_metrics[k].append(metrics[k].item())
for k in metrics:
print(f'{k}: {np.mean(all_metrics[k]) * 100:.3f}')
This code prints AJ of around 58.1, and OA around 85.4.
To the best of my knowledge, this code is using only the global support grid of 5x5. I also tried adding extra local support points here https://github.com/facebookresearch/co-tracker/blob/main/cotracker/predictor.py#L146 . Despite I followed the strategy proposed in Appendix A (5x5 global points + 8x8 local points + add an extra for loop in jupyter to run inference with one point query at a time and concatenate point trajectories after), the numbers did not change much.
UPD. When I use the scaled online model, the performance is better but it is still far from claimed one.
Here is how I create the model
from cotracker.predictor import CoTrackerPredictor
model = CoTrackerPredictor(
checkpoint=os.path.join(
'./co-tracker/cotracker_checkpoints/scaled_online.pth'
),
offline=False,
window_len=16
)
The numbers I get are AJ ~61 and OA ~89.
Hi @artemZholus, could you try to evaluate the model with the provided evaluation predictor following these steps? https://github.com/facebookresearch/co-tracker?tab=readme-ov-file#evaluation