InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

Hi! Could you kindly clarify about [InternVideo v1](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1) released models? On what was InternVideo-MM-L-14 pretrained? The GitHub page says WebVid10M+Self-collected (14M), while in the paper it’s WebVid2M, WebVid10M, and HowTo100M....

![image](https://github.com/OpenGVLab/InternVideo/assets/121412945/d8361bce-cef0-43f2-8c6e-dbff69da1ba5) when will model interaction part be released?

I'd like to know how to download InternVid-10M-FLT dataset. It seems both the hugging face and OpenDataLab can not access to original videos for downloading.

I see there are 3 subsets: DIV, FLT, and the aesthetic version. What are the filtering criteria used for DIV and FLT, and what do they stand for?

I wonder if the 6k actions are available. Thank you

In your “ zero shot text to video retrieval ” setting, you only use 1 gpu for eval,I want to kindly inquire how to use multi-gpu for evaluation?Can I change...

Hello, thanks for releasing the code of this cool paper! I would like to try the Temporal Action Localization on my own custom data. I have generated raw_frames for each...