fairseq-image-captioning
fairseq-image-captioning copied to clipboard
Training end-to-end on my own dataset
I have my own dataset of (image, caption) pairs on which I'd like to train the model. Does this repository enables to do that without first extracting features/bounding boxes?
Can I do it via avoiding passing the flag
--features ?