Hi, how to understand the LF-hdvila-8m?

Open sunwhw opened this issue 1 year ago • 1 comments

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’？ In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details?

Apr 06 '24 16:04 sunwhw

Is the line in 'lfvila8m_clipid.jsonl' a video clips-sentence pair? And I see an variational number of video-clips per row. So how the video-clips of 'lfvila8m_clipid.jsonl' is divided from the original ‘hdvila_clip_text_100m.jsonl’？ In addition to the selection of videos with more than 4 clips mentioned in the paper, are there any details?

Where can I find annotation files containing video captions， "hdvila_clip_text_100m.jsonl" ? Thanks

May 08 '24 17:05 GXYM