dongduck
dongduck
In PETR, they used randomly initialized queries without query positional encoding from reference points. So, section 3.3 seems correct with the code of this repo.
while two-stage deformable DETR embedded their queries with their initial bboxes.
_Hence, I think there is still some difference between section 3.3, where the locations of initial reference points are randomly initialized and learned._ -> sorry, I checked it. `The initial...
I check the code and I think that the two-stage mode (default setting in the code) denotes that the initial reference points for the decoder are initialized from the top...
yes, the number of sampling offsets for one reference point (i.e. one keypoint) is set as **1 for each head and feature level** So there are 32 sampling offsets for...
 It could be visualized like this. green is the reference point and other points are sampling points. (low attentions are dismissed)
**Evaluation** bash tools/dist_test.sh $CONFIG $CHECKPOINT $NUM_GPU --eval keypoints ex) CUDA_VISIBLE_DEVICES=1,2 bash tools/dist_test.sh configs/petr/petr_r50_16x2_100e_coco.py checkpoint/petr_r50_16x2_100e_coco.pth 2 --eval keypoints **Training** bash tools/dist_train.sh $CONFIG $NUM_GPU ex) CUDA_VISIBLE_DEVICES=1,2 bash tools/dist_train.sh configs/petr/petr_r50_16x2_100e_coco.py 2 **Inference** ex)...
Did you download coco datasets and annotation files? you should download annotation files and images from [https://cocodataset.org/#download]. and fix data root on 'data_root = '/dataset/public/coco/', which is in [configs/_base_/datasets/coco_keypoint.py].
I couldn't find the video inference code in this repository.. I recommend converting mp4 videos into png frames for inference... or fix codes for video. I am sorry that I...
Convert your dataset to webdataset format. Then, specify the tar file location in cfg file.