InstructCV icon indicating copy to clipboard operation
InstructCV copied to clipboard

Data preparation

Open sunhope54 opened this issue 1 year ago • 4 comments

Thank you for your exordinary work! I want to know how to download the right dataset when occurring the various choises in the official websites. image

sunhope54 avatar Jan 09 '25 00:01 sunhope54

Thank you for your interest. We used the 2014 training, validation, and test images and the corresponding annotations.

sunrainyg avatar Jan 09 '25 15:01 sunrainyg

Thank you very much for your timely reply. Excuse me again. I would like to ask how to obtain the multi-modal and multi-task datasets in your training process. Aren't the storage formats of each dataset different? My main problem is that I didn't quite understand the content of DATASET.md. I'm sorry to have taken up your time. Please accept my apologies again!

sunhope54 avatar Jan 10 '25 00:01 sunhope54

You can gather all the necessary multi-modal data for various tasks by following the instructions in DATASET.md to execute the scripts. Once the process is complete, all training data will be stored in the data/image_pairs_train directory.

This data must be generated before starting the training. During the training phase, the model will utilize data from different tasks for training.

To begin, you can run the following command:

python build_data/format_dataset_rp.py --save_root './image_pairs_train' --tasks ['det'] --data_root './data/coco'

Afterwards, you can modify the --tasks or --data_root parameters to generate data for other tasks.

Let me know if you have any further questions.

sunrainyg avatar Jan 10 '25 02:01 sunrainyg

Thank you very much for your previous answers, and I apologize again for my questions. I am still having some issues with building a multi-task instruction-tuning dataset. Can I build the dataset by executing the following code: python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['det'] --data_root './data/coco' python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['seg'] --data_root './data/ADE20k' python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['cls'] --data_root './data/Oxford-IIIT' python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['depes'] --data_root './data/NYUV2' Also, when I process datasets other than coco, the following errors occur: image It seems that the code still deals with the coco dataset. How to sovle the problem? Finally, thank you for taking the time to look at my problem. Best regards.

sunhope54 avatar Jan 16 '25 02:01 sunhope54