Andrew Bydlon
Andrew Bydlon
@ejguan: Do you have any suggestions for properly resetting Dataloader 2 after each epoch? With e.g. `worker_reset_fn`.
@Adenialzz: To get what I showed above, it's more or less the same setup as for the a torch dataset. Replace the dataset with a datapipe. ``` sampler = DistributedSampler(datapipe)...
I'll give it a try today.
Sorry for the delay @Adenialzz. You are correct that it doesn't work with DDP and without a length on an iterable data pipe. I reverted to DL2 despite its notably...
Thanks Russell. Look forward to hearing more! I've tried to implement it, but my loss starts creeping up after a few 1000 steps. I hypothesize that fsdp wrapping each module...
This can be achieved using `fsdp.summon_full_parameters()` for the ema updates :)
I'm also using dictionaries and see a memory leak. I'm highlighting a different issue but I'm seeing a small increase in usage over time as well: https://github.com/pytorch/data/issues/1185
I too would like to hear what limitations you are referencing. If it is performance oriented, I believe there's an argument. You could make something compatible with a compiled framework,...
I posted another MemoryError that may be related here: https://discuss.pytorch.org/t/torchdata-w-ddp-start-of-epoch-2-get-memoryerror/179523 My MemoryError also occurs at the start of the epoch while using DDP and distributed multiprocessing. It seems to depend...
It's difficult to provide code for this purpose as the code is property of a large corp. Some other notes and expansion of the other thoughts: Mention of shuffling cause...