One-Example-Person-ReID A question about the processing of video datasets

Hellow!For video datasets, such as Mars, I would like to ask whether a tracklet is similar to a single frame image,? And whether all the frames in a tracklet are input into the network at the same time?

Nov 28 '20 07:11 xiaonvxia

Hi,

For the first question, yes.

For the second one, 16 frames out of the tracklet are input into the network during training. But for inference, we input all frames at the same time.

Nov 28 '20 15:11 Yu-Wu

Why do you select 16 instead of inputing them all into the network when training?

Nov 29 '20 03:11 xiaonvxia

Because we do not have tooooo much GPU memory for training. The largest tracklet has more than 1,000 frames, which needs 60x times GPU memory cost.

Nov 29 '20 05:11 Yu-Wu

Thank you very much for your reply!

Nov 29 '20 07:11 xiaonvxia