Distributed training tutorial with DataLoader2
📚 The doc issue
I am not sure how to implement distributed training.
Suggest a potential alternative/fix
If there was a simple example that showed how to use DDP with the torchdata library it would be super helpful.
I see some blog post that says to implement something like this -
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=..., sampler=train_sampler)
I'm not sure what the way of doing this is that torchdata offers
CC @ejguan
It makes sense. We should add a section for distributed training DataPipe with the existing DataLoader.
And, after DataLoader2 + DistributedReadingService becomes beta stage, we can add tutorial for them as well. Edit: Unfortunately, DistributedReadingServiceis still WIP to make DataPipe working withDataLoader2` for distributed training.
@MatthewCaseres
The TLDR for distributed training with DataPipe: you should add sharding_filter operation in your pipeline similar to what we suggest in the tutorial regarding multiprocessing. Then, DataLoader would execute sharding over your pipeline based on your distribtued setting automatically.
Hi @ejguan , is the status for this still WIP? Thanks
@austinmw Thanks for asking. Feature-wise, pure distributed training is ready and I am working on a tutorial currently. TLDR of the tutorial would be:
datapipe = source_dp.shuffle().sharding_filter() # Make sure distributed sharding is mutually exclusive and collectively exhaustive
datapipe = datapipe.map(transform_fn)
...
def main(rank, world_size)
dist.init_process_group(rank=rank, world_size=world_size)
rs = DistributedReadingService()
dl = DataLoader2(datapipe, reading_service=rs)
for epoch in range(10):
torch.manual_seed(epoch) # Set determinism for all ranks
for d in dl:
...
I have one more blocking feature for distributed + MP of making non-replicable DataPipe remains in the main process. With such a feature, I will add a tutorial to support distributed + MP use case.
Thanks for your response, and looking forward to reading your tutorial!
Hi @ejguan , I noticed this is closed. Is there a tutorial for distributed + MP? I see tutorial for DistributedReadingService, but I thought that this would be implemented with SequentialReadingService?
SequentialReadingService is still WIP. I will add a tutorial with MP + Dist after it's landed.
This feature is tracked in https://github.com/pytorch/data/issues/911