InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

Thank you so much for the awesome repo. Can you please share the 6k action words? It would be useful to perform zero shot classification of videos into those 6K...

Hello InternVideo team, You guys have done a great job with this project! In your paper, you use the Stage 2 model for the task of temporal grounding on QVHighlight...

Problems: Use demo to test action classification on kinetics-700 validation set but get very poor result Experiment: 1. Pretrained model: https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/tree/main 2. text candidate:use the class name of k700 dataset...

我运行 这个脚本,在timm.models.create_model 报错 RuntimeError: Unknown model (internvideo2_1B_patch14_224) 该怎么解决

Thanks for the paper and the open sourcing the code base. I would like to know how evaluation is performed on the MSR-VTT dataset for zero shot text to video...

It seems need 1.1M.tsv In scripts/pretraining/1B_pt.sh. the format should like: # line format: source, path, total_time, start_time, end_time, target but the [UniFormerV2] provide # line format: path, id so where...

Dear Authors, How can I use the Internvideo2 model for Video Question Answering or Summarization tasks given a video? Please provide a demo script if any for testing on new...

I request Authors to release finetuning for Internvideo2 model with multimodality: [https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality#finetuning](multimodal-finetuning)

Thanks for the great work! Is it possible to do adapt the model for video prediction? And if so, what decoder model shall I use? Thanks for any suggestions!

Hi, the link of internvideo2 checkpoint shows "404".