Set better defaults for `MultiProcessingReadingService`
🚀 The feature
class MultiProcessingReadingService(ReadingServiceInterface):
num_workers: int = get_number_of_cpu_cores()
pin_memory: bool = True
timeout: float
worker_init_fn: Optional[Callable[[int], None]] # Remove this?
prefetch_factor: int = profile_optimal_prefetch_factor(model : nn.Module)
persistent_workers: bool = True
I can add these, opening this issue to discuss whether it's a good idea to change defaults.
+: Users get better out of the box performance with torchdata
-: backward compatibility issues when moving from dataloaderv1 to dataloaderv2
Motivation, pitch
There are many issues on discuss, stack overflow, and blogs describing how people should configure data loaders for optimized performance. Since a lot of the tricks haven't changed like pin_memory = true or num_workers = num_cpu_cores or persistent_workers=true and since we're in the process of developing dataloaderv2 now may be a good time to revisit these default values
- https://www.jpatrickpark.com/post/prefetcher/#:~:text=The%20prefetch_factor%20parameter%20only%20controls,samples%20prefetched%20across%20all%20workers.)
- https://stackoverflow.com/questions/53998282/how-does-the-number-of-workers-parameter-in-pytorch-dataloader-actually-work
- https://discuss.pytorch.org/t/when-to-set-pin-memory-to-true/19723
Alternatives
- Instead of setting reasonable defaults, we can instead extend the
linter.pyto suggest some of these tips if we notice some sources of slowdowns - Do nothing, suggest people read documentation when configuring performance
Additional context
No response
Just a note: MultiProcessingReadingService (MPRS) is a temporary reading service. We will change it to the the PrototypeMultiProcessingReadingService. And, for prefetch_factor, we might provide it as DataPipes to let users define it in their pipeline. And, we have decided to make pin_memory as an adapt_fn. See: https://github.com/pytorch/data/pull/485
For worker_init_fn, I think we still need it for users to control the state of worker process if they have special use cases.
I like the idea by providing some reasonable number of workers by default, since it makes no sense to provide num_worker=0 to MPRS to achieve single process iteration.