Evan Zamir
Evan Zamir
Yep! Definitely worth doing.
I don't understand how this code from that Colab notebook actually works: ``` class DataParallelIterableDataset(IterableDataset): def __len__(self): # Caveat: When using DistributedSampler, we need to know the number of samples...
When I add the two lines to get world size and process rank to my `__iter__` code, it freezes my script. :(
What I'm failing to understand is how in practice to pass the rank and world_size to the dataset when that is being created by my DataModule, before the Trainer is...
One question that I guess seems obvious to you guys but not to me, do I have to explicitly call `init_process_group`? If so, where should that be done in a...
> @EvanZ I was also confused about this at first, but then figured it out. The Trainer does not need any information about the data to be instantiated. So I...
Ok...maybe that's the missing detail I needed. I'll work on it some more!
Hmm that's interesting and a different organization than I use. I define `{train/val/test}_dataloader` inside my `DataModule`. I do currently use islice as well though like this: ``` def __iter__(self) ->...