data icon indicating copy to clipboard operation
data copied to clipboard

Distributed training tutorial with DataLoader2

Open MatthewCaseres opened this issue 3 years ago • 3 comments

📚 The doc issue

I am not sure how to implement distributed training.

Suggest a potential alternative/fix

If there was a simple example that showed how to use DDP with the torchdata library it would be super helpful.

MatthewCaseres avatar Jul 25 '22 22:07 MatthewCaseres

I see some blog post that says to implement something like this -

train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=..., sampler=train_sampler)

I'm not sure what the way of doing this is that torchdata offers

MatthewCaseres avatar Jul 25 '22 23:07 MatthewCaseres

CC @ejguan

VitalyFedyunin avatar Jul 26 '22 15:07 VitalyFedyunin

It makes sense. We should add a section for distributed training DataPipe with the existing DataLoader. And, after DataLoader2 + DistributedReadingService becomes beta stage, we can add tutorial for them as well. Edit: Unfortunately, DistributedReadingServiceis still WIP to make DataPipe working withDataLoader2` for distributed training.

@MatthewCaseres The TLDR for distributed training with DataPipe: you should add sharding_filter operation in your pipeline similar to what we suggest in the tutorial regarding multiprocessing. Then, DataLoader would execute sharding over your pipeline based on your distribtued setting automatically.

ejguan avatar Jul 26 '22 16:07 ejguan

Hi @ejguan , is the status for this still WIP? Thanks

austinmw avatar Jan 11 '23 13:01 austinmw

@austinmw Thanks for asking. Feature-wise, pure distributed training is ready and I am working on a tutorial currently. TLDR of the tutorial would be:

datapipe = source_dp.shuffle().sharding_filter()  # Make sure distributed sharding is mutually exclusive and collectively exhaustive
datapipe = datapipe.map(transform_fn)
...
def main(rank, world_size)
    dist.init_process_group(rank=rank, world_size=world_size)
    rs = DistributedReadingService()
    dl = DataLoader2(datapipe, reading_service=rs)
    for epoch in range(10):
        torch.manual_seed(epoch)  # Set determinism for all ranks
        for d in dl:
            ...

I have one more blocking feature for distributed + MP of making non-replicable DataPipe remains in the main process. With such a feature, I will add a tutorial to support distributed + MP use case.

ejguan avatar Jan 11 '23 15:01 ejguan

Thanks for your response, and looking forward to reading your tutorial!

austinmw avatar Jan 11 '23 16:01 austinmw

Hi @ejguan , I noticed this is closed. Is there a tutorial for distributed + MP? I see tutorial for DistributedReadingService, but I thought that this would be implemented with SequentialReadingService?

austinmw avatar Feb 01 '23 17:02 austinmw

SequentialReadingService is still WIP. I will add a tutorial with MP + Dist after it's landed.

ejguan avatar Feb 01 '23 17:02 ejguan

This feature is tracked in https://github.com/pytorch/data/issues/911

ejguan avatar Feb 01 '23 17:02 ejguan