first stage of train.py for ESRGAN is slow and single threaded
Hi,
not sure if this is a bug, maybe just a general question.
When using extract_images.py, it was possible to run it on multiple threads and save a lot of time.
I am starting the training now, the original dataset has almost 4000 images, so the resulting dataset with the subimages is close to 100GB.
When i start the training there is a first pretty long phase (many hours in my case), during which there is only one python process running. Is there a way to make this phase multithreaded to save time on this part of the training? It seems this part does not use the GPU also, so i need to wait until it is finished to see if the GPU is indeed working, or even see other issues with my config file.
Thank you so much for this amazing repo and all the work!
You mean when prefetch the data?i meet the same qusetion, the info show model[] is created,and then dont show anything.
aaaah so there is indeed an option to used CUDA instead of cpu for prefetch! I'll try this out in a couple days. Maybe this will help!
It is the prefetch_mode: key in the yaml file if i understand correctly.
@gtnbssn Did you trained on a custom dataset?
Yes.
IMO, the time consuming operation is here:
https://github.com/xinntao/BasicSR/blob/5c757162b348a09d236e00c2cc04463c0a8bba45/basicsr/data/data_sampler.py#L33
The operation here is intended to keep reproducibility.
If you have a large dataset, try to reduce dataset_enlarge_ratio in the configuration file and keep the Require iter number per epoch is just greater than Total iters in logging output.
An example snippet of logging output:
2021-06-20 16:47:42,337 INFO: Training statistics:
Number of train images: 64612
Dataset enlarge ratio: 1000
Batch size per gpu: 16
World size (gpu number): 4
Require iter number per epoch: 1009563
Total epochs: 1; iters: 600000.