InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

I would like to fine-tune InternVideo2_Chat_8B_InternLM2_5 on a specific task. How should I do this? Thanks!

Thanks for such beautiful work! In the past, the similarity between video and text was usually calculating the similarity between each frame and text using text-image CLIP, and then take...

你好,请问如何运行internvideo2+videochat2的模型? 在下载了internvideo2_s3之后,我应该下载videochat2的哪些权重呢,他们的仓库只发布了umt编码的权重。 非常感谢!

Hi guys, Thank you for providing this valuable dataset to the community. I’ve started working with the JSON files from Hugging Face, but some videos are blocked due to geographic...

When I read many articles about VFM, I often find that methods incorporating the audio modality tend to perform better than those using only video and text. Could you please...

运行 InternVideo2_stage2_1B 的 demo.ipynb 时,intern_model, tokenizer = setup_internvideo2(config) 会报如下警告: load_state_dict: _IncompatibleKeys(missing_keys=[], unexpected_keys=['temp', 'itm_head.weight', 'itm_head.bias']) 最终能够得到如下运行结果: text: A man in a gray sweater plays fetch with his dog in the snowy...

Dear Team, Thank you for the great work. I was currently exploring the InternVideo2-Chat 8B and had a few questions/doubts regarding it. 1. What is the visual encoder used? Is...

I am trying to download a downstream classification task model (VideoMAE-L K700) but got this instead: ``` This XML file does not appear to have any style information associated with...

Hi, thanks for your fantastic video foundation model! I was interested in exploring the capabilities of InternVideo2-Chat for both images and video. According to the Huggingface code, the model can...

Thanks for the great work! In stage 3, the video encoder is updated to improve its support for video-centric dialogue. Will stage 3 training affect the performance on basic video...