InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

Thank you for your great work! I downloaded the finetuned model provided in your model zoo: https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-f8-thSth/blob/main/1B_ft_ssv2_f8.pth (with 77.1% topp-1 accuracy reported on SthV2) and prepared the dataset SthV2 according...

When I run this demo.ipynb, I encounter the following import error. Theoretically, Python should be able to successfully import from the current directory using ..utils.easydict. Can you explain why this...

如果二维码失效,可以加微信拉入群:yx116169 ![20240716-114139](https://github.com/user-attachments/assets/7b465095-9367-4e2d-bac8-f9eea0a0b01e)

Hellow , nice job ! I can not reproduce the MSRVTT finetuned model,and I set each args as the [log](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/retrieval/msrvtt/kc4_finetune_1e-32e-3_77words_12frames_128_16_bothdsl/log.txt) Also I check each problems ,such as dataloade or the...

Dear authors, Can you share some details about how we can generate the captions for new videos in the same manner as done for Intervid? From the paper, you generated...

Thank you for your incredible works! I would like to use the InternVid-Aesthetics-18M dataset to train some video generation models, but didn't find any available instruction or documentation of how...

Dear authors, Great work and thanks for releasing the code for ViClip pretraining on InternVid-10M-FLT. Firstly, It would be really great if the pre-trainning instructions are more detailed, like which...

Hi, great work and thanks for releasing the code. In Table 10 of your InternVideo2 paper, you reported the results of finetuning video retrieval in both T2V and V2T on...

Dear Authors, I am trying to reproduce Zeroshot performance with the checkpoint [ViCLIP-L-14 InternVid-10M-FLT ](https://huggingface.co/OpenGVLab/ViCLIP). However, the performance is different from reported numbers in the paper. Here are the results...

Hello authors, Table 16 in the InternVideo2 paper reports a score of 60.9 on MVBench, with VideoChat2 + InternVideo2s3-1B + Mistral-7B. However, the highest score on MVBench leaderboard is 60.4...