Data preparation
Thank you for your exordinary work! I want to know how to download the right dataset when occurring the various choises in the official websites.
Thank you for your interest. We used the 2014 training, validation, and test images and the corresponding annotations.
Thank you very much for your timely reply. Excuse me again. I would like to ask how to obtain the multi-modal and multi-task datasets in your training process. Aren't the storage formats of each dataset different? My main problem is that I didn't quite understand the content of DATASET.md. I'm sorry to have taken up your time. Please accept my apologies again!
You can gather all the necessary multi-modal data for various tasks by following the instructions in DATASET.md to execute the scripts. Once the process is complete, all training data will be stored in the data/image_pairs_train directory.
This data must be generated before starting the training. During the training phase, the model will utilize data from different tasks for training.
To begin, you can run the following command:
python build_data/format_dataset_rp.py --save_root './image_pairs_train' --tasks ['det'] --data_root './data/coco'
Afterwards, you can modify the --tasks or --data_root parameters to generate data for other tasks.
Let me know if you have any further questions.
Thank you very much for your previous answers, and I apologize again for my questions. I am still having some issues with building a multi-task instruction-tuning dataset. Can I build the dataset by executing the following code:
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['det'] --data_root './data/coco'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['seg'] --data_root './data/ADE20k'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['cls'] --data_root './data/Oxford-IIIT'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['depes'] --data_root './data/NYUV2'
Also, when I process datasets other than coco, the following errors occur:
It seems that the code still deals with the coco dataset. How to sovle the problem?
Finally, thank you for taking the time to look at my problem. Best regards.