rhingo
rhingo
Hi, I saw that the names of the full 5 minute videos from the MEVA dataset was included with the MEVID dataset. I am wondering if the annotations identifying the...
### Prerequisites - [X] I have read the [documentation](https://hf.co/docs/autotrain). - [X] I have checked other issues for similar problems. ### Backend Local ### Interface Used CLI ### CLI Command ```autotrain...
Do you happen to have the checkpoints that were used at the time the paper was written? It appears as though fairseq has been updated quite a bit since then,...
Hi, In the paper it describes that the input to the Llama2 model is one audio token, 3 visual tokens (from 3 separate encoders), and text tokens. However, it seems...