Transformers-Tutorials
Transformers-Tutorials copied to clipboard
Training custom dataset on the Vilt VQA finetuning
Hi, I have seen what you have done in the tutorial of training Vilt on the VQA-V2 dataset and how you constructed the dataset and the data loader. However, I am trying to apply the same, but on my own dataset, which has 6 q&a pairs for each image. I am just a beginner so I am wondering what to take into consideration when following the same steps.
Note: of course it's not the same corpus, but most of the answers are one word and some contain 2-3 words, is it still valid to follow the same steps ?
Thanks in advance, your help will be appreciated.