Currently, DatasetFolder load images and transform them in the same thread, so the latency is io latency + transform latency. It is better if we load images in another thread, so the latency is max(io, transform).

some test results:

python3 main.py -a resnet18 -b 256 -j 8 /imagenet --gpu 0

base: 73.4s (sum of the data part of Epoch [0]: results)
pipelined: 198.47s

CUDA_VISIBLE_DEVICES="0,1,2,3" python main.py -a resnet18 -b 1024 -j 16 /data/PyTorchRawImage/

base: 913s
pipelined: 580s

See also the coupled pr in https://github.com/pytorch/pytorch/pull/28199 .

Oct 17 '19 03:10 lycplus

Codecov Report

Merging #1482 into master will decrease coverage by 0.24%. The diff coverage is 23.68%.

@@            Coverage Diff             @@
##           master    #1482      +/-   ##
==========================================
- Coverage   64.46%   64.21%   -0.25%     
==========================================
  Files          83       83              
  Lines        6421     6458      +37     
  Branches      982      989       +7     
==========================================
+ Hits         4139     4147       +8     
- Misses       1992     2020      +28     
- Partials      290      291       +1

Impacted Files	Coverage Δ
torchvision/datasets/folder.py	`62.6% <23.68%> (-19.45%)`	:arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 53b062c...b4bd6b2. Read the comment docs.

Oct 17 '19 06:10 codecov-io

Thanks for the PR!

Can you give more context on why this change is needed, and what it is trying to achieve?

Also, opening an issue before sending a PR adding a new functionality is generally the recommended way of proceeding, so that we can discuss beforehand about the needs of such functionality.

Hi, I have updated the pr message to give the context for this.

Oct 22 '19 03:10 lycplus

@fmassa could you please have a look at this pr ?

Mar 21 '20 10:03 lycplus

I think this adds a lot of complexity on something that is supposed to be simple, and the interactions with the multi-threaded DataLoader is not clear to me.

Even if this improves performance, I would rather not have it merged in torchvision. Indeed, one could ask why such an approach is not implemented for the other datasets.

Mar 24 '20 13:03 fmassa

Pipeline io.

Codecov Report