InternVideo issues

How can I finetune InternVideo2_Chat_8B_InternLM2_5

I would like to fine-tune InternVideo2_Chat_8B_InternLM2_5 on a specific task. How should I do this? Thanks!

Annie1900

Some doubts about the absolute value of ViCLIP similarity

4

Thanks for such beautiful work！ In the past, the similarity between video and text was usually calculating the similarity between each frame and text using text-image CLIP, and then take...

LiuHuijie6410

videochat2 with internvideo

3

你好，请问如何运行internvideo2+videochat2的模型？在下载了internvideo2_s3之后，我应该下载videochat2的哪些权重呢，他们的仓库只发布了umt编码的权重。非常感谢！

yepzhang

InternVid Raw Videos

1

Hi guys, Thank you for providing this valuable dataset to the community. I’ve started working with the JSON files from Hugging Face, but some videos are blocked due to geographic...

waybarrios

Does InternVideo2-1B-s2 use AudioEncoder?

3

When I read many articles about VFM, I often find that methods incorporating the audio modality tend to perform better than those using only video and text. Could you please...

haoyi199815

load_state_dict: _IncompatibleKeys(missing_keys=[], unexpected_keys=['temp', 'itm_head.weight', 'itm_head.bias'])

3

运行 InternVideo2_stage2_1B 的 demo.ipynb 时，intern_model, tokenizer = setup_internvideo2(config) 会报如下警告： load_state_dict: _IncompatibleKeys(missing_keys=[], unexpected_keys=['temp', 'itm_head.weight', 'itm_head.bias']) 最终能够得到如下运行结果： text: A man in a gray sweater plays fetch with his dog in the snowy...

lexilii

InternVideo2-Chat 8B Visual Encoder and Text Encoder

2

Dear Team, Thank you for the great work. I was currently exploring the InternVideo2-Chat 8B and had a few questions/doubts regarding it. 1. What is the visual encoder used? Is...

Divyanshupy

Downstream classification task checkpoint link not working

2

I am trying to download a downstream classification task model (VideoMAE-L K700) but got this instead: ``` This XML file does not appear to have any style information associated with...

MH-Python

What is the proper way to preprocess image inputs for InternVideo2-Chat?

7

Hi, thanks for your fantastic video foundation model! I was interested in exploring the capabilities of InternVideo2-Chat for both images and video. According to the Huggingface code, the model can...

chancharikmitra

Can stage-3 training further improve the performance of InternVideo2 on basic video tasks

Thanks for the great work! In stage 3, the video encoder is updated to improve its support for video-centric dialogue. Will stage 3 training affect the performance on basic video...

fushh

InternVideo
InternVideo copied to clipboard

Metadata

How can I finetune InternVideo2_Chat_8B_InternLM2_5

Some doubts about the absolute value of ViCLIP similarity

videochat2 with internvideo

InternVid Raw Videos

Does InternVideo2-1B-s2 use AudioEncoder?

load_state_dict: _IncompatibleKeys(missing_keys=[], unexpected_keys=['temp', 'itm_head.weight', 'itm_head.bias'])

InternVideo2-Chat 8B Visual Encoder and Text Encoder

Downstream classification task checkpoint link not working

What is the proper way to preprocess image inputs for InternVideo2-Chat?

Can stage-3 training further improve the performance of InternVideo2 on basic video tasks

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard