Demo example in the paper of InternVideo2.5
Great work! I have tried the demo on HuggingFace. May I ask how to get the results in Figure 4 and Figure 5 in the paper? i.e., retrieve the specific time/frames corresponding to the prompts. For example, "In this video, in which frames does a man appear?" "In this video, from which second to which second does a man appear?" Currently, the demo cannot output the right frames/seconds. Many thanks!
I have the same question for this...
I have the same question for this...
I have the same question for this...
I have the same question for this...
same problem