XPretrain icon indicating copy to clipboard operation
XPretrain copied to clipboard

Hi, how to understand the LF-hdvila-8m?

Open sunwhw opened this issue 1 year ago • 1 comments

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’? In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details? image

sunwhw avatar Apr 06 '24 16:04 sunwhw

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’? In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details? image

Where can I find annotation files containing video captions, "hdvila_clip_text_100m.jsonl" ? Thanks

GXYM avatar May 08 '24 17:05 GXYM