InternVideo issues

Hi! Could you kindly clarify about [InternVideo v1](https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1) released models? On what was InternVideo-MM-L-14 pretrained? The GitHub page says WebVid10M+Self-collected (14M), while in the paper it’s WebVid2M, WebVid10M, and HowTo100M....

ninatu

model interaction part of internvideo1

![image](https://github.com/OpenGVLab/InternVideo/assets/121412945/d8361bce-cef0-43f2-8c6e-dbff69da1ba5) when will model interaction part be released?

lynshwoo2022

InternVid dataset download

3

I'd like to know how to download InternVid-10M-FLT dataset. It seems both the hugging face and OpenDataLab can not access to original videos for downloading.

jqsun98

What do DIV and FLT stand for?

3

I see there are 3 subsets: DIV, FLT, and the aesthetic version. What are the filtering criteria used for DIV and FLT, and what do they stand for?

vedantroy

Are the 6k action words available?

1

I wonder if the 6k actions are available. Thank you

shuohan

How to set multi-gpu to eval zero shot performance？

In your “ zero shot text to video retrieval ” setting， you only use 1 gpu for eval，I want to kindly inquire how to use multi-gpu for evaluation？Can I change...

1240446371

Extract features from custom data.

9

Hello, thanks for releasing the code of this cool paper! I would like to try the Temporal Action Localization on my own custom data. I have generated raw_frames for each...

svenssona

InternVideo
InternVideo copied to clipboard

Metadata

Update README.md

Update README.md

Update README.md

InternVideo-MM-L-14 pretraining datasets

model interaction part of internvideo1

InternVid dataset download

What do DIV and FLT stand for?

Are the 6k action words available?

How to set multi-gpu to eval zero shot performance？

Extract features from custom data.

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard