One-Example-Person-ReID icon indicating copy to clipboard operation
One-Example-Person-ReID copied to clipboard

A question about the processing of video datasets

Open xiaonvxia opened this issue 5 years ago • 4 comments

Hellow!For video datasets, such as Mars, I would like to ask whether a tracklet is similar to a single frame image,? And whether all the frames in a tracklet are input into the network at the same time?

xiaonvxia avatar Nov 28 '20 07:11 xiaonvxia

Hi,

For the first question, yes.

For the second one, 16 frames out of the tracklet are input into the network during training. But for inference, we input all frames at the same time.

Yu-Wu avatar Nov 28 '20 15:11 Yu-Wu

Why do you select 16 instead of inputing them all into the network when training?

xiaonvxia avatar Nov 29 '20 03:11 xiaonvxia

Because we do not have tooooo much GPU memory for training. The largest tracklet has more than 1,000 frames, which needs 60x times GPU memory cost.

Yu-Wu avatar Nov 29 '20 05:11 Yu-Wu

Thank you very much for your reply!

xiaonvxia avatar Nov 29 '20 07:11 xiaonvxia