Questions about charades_video_tsn.py
Thank you for sharing the code. It help a lot. But there is some confusion about the data loading procedure in charades_video_tsn.py. Here is the codes: ``
n = self.data['datas'][index]['n']
if shift is None:
shift = np.random.randint(n-self.train_gap-2)
else:
shift = int(shift * (n-self.train_gap-2))
ss = [shift] + [np.random.randint(n-self.train_gap-2)
for _ in range(self.segments-1)]
``
In this way, ss would be a list including indexes of images to be loaded. But I find the values in ss not regular, which seems not consistent with TSN.
Did I miss something? By the way, what's the difference among charades_tsn.py, charades_video.py and charades_video_xx.py?
Looking forward to your reply.
The TSN part of the codebase was very experimental, so feel free to submit a pull request if you get good results.
I believe this is a version of TSN that picks a "center segment" and then randomly samples points before and after this segment (since it's sorted here https://github.com/gsig/PyVideoResearch/blob/46307b1a03ce670696297e2154ddee6f4e6b0b8a/datasets/charades_video_tsn.py#L25)
The original TSN code chooses something like 3 equally spaced segments, but we were extending it to larger videos, so we introduce some random sampling.
charades_tsn.py just borrows code from the video version, but returns individual frames instead of video clips. Video clip is the name I use for a "stack of video frames".
charades_video returns a video clip instead of a single frame.
charades_video_xx.py are different versions of the charades_video.py dataloader but with different data augmentations, and number of clips sampled a test time etc.
Hope that helps, Gunnar
Thanks for your patient explanation. I will try some experiments and I will leave a comment on this issue if get any new findings.