InternVideo issues

zeroshot video-retrieval

Thank you for your work！ But I have a question about zero shot video-retrieval task on activitynet dataset， which pretrain model I should use to reproduce the performance？Is Clip ViT-L-14.pt?...

1240446371

Notebook for running spatiotemporal detection

Hello, I am unable to the spatiotemporal action localization. It would be good to know how to run the shared module for spatiotemporal action recognition. Best

kaustavnandy

Ckpt release of vit-H model

1

Hello, could you please release the ckpt of vit-H model. Thanks.

LemonTwoL

why InternVid-200M performance is not as good as ViClip-InternVid-10M-FLT.pth

1

hi, since 200M pretrained dataset is much bigger than 10M version, so why the zero shot performance is not superior than 10M?

dragen1860

why find_unused_parameters=True in video media_type training?

hi, ![image](https://github.com/OpenGVLab/InternVideo/assets/4252555/eee164b8-9c49-4b53-afeb-0ae900885052) the code say in video training, need to set find_unused_parameters=True, but in images, it can be set False. I wonder why? thank you.

dragen1860

Running finetuned models on costum data

I want to run the finetuned model on InternVideo-MM-L-14 | ActivityNet. I have my own costum videos. Do you have a simple demo script to run this model? (Similar to...

jlevyr

Bug in ViCLIP's implementation + demo

In https://github.com/OpenGVLab/InternVideo/blob/6264cc85f72e38dce7e38549182d0369c50cde00/Data/InternVid/viclip/viclip.py#L140-L143 a single video is converted to a batch of 8 images (instead of a batch of one video). This bug is transferred to the demo in https://github.com/OpenGVLab/InternVideo/blob/6264cc85f72e38dce7e38549182d0369c50cde00/Data/InternVid/viclip/__init__.py#L71 where...

mannatsingh

About Framework of ViCLIP

3

Thank you for nice work. In training ViCLIP, I would like to clarify my understanding of this paper. If vision transforms is not pre-trained such as MAE method, then, it...

kimsekeun

Any missing classes/functions in viclip_text.py and viclip_vision.py?

3

Hello, I am so glad that you open-sourced the checkpoints and the demo script recently. When I run the provided demo script, I found that the `viclip.py` attempt to import...

lixin4ever

UniformerV2 features

2

Hi Authors ! Im trying to reproduce InternVideo+ActionFormer for temporal action localization. Just wanted to know your timelines to release the UniformerV2 features? Thank you,

abhinine4

InternVideo
InternVideo copied to clipboard

Metadata

zeroshot video-retrieval

Notebook for running spatiotemporal detection

Ckpt release of vit-H model

why InternVid-200M performance is not as good as ViClip-InternVid-10M-FLT.pth

why find_unused_parameters=True in video media_type training?

Running finetuned models on costum data

Bug in ViCLIP's implementation + demo

About Framework of ViCLIP

Any missing classes/functions in viclip_text.py and viclip_vision.py?

UniformerV2 features

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard