data icon indicating copy to clipboard operation
data copied to clipboard

Shuffler inside a Zipper only shuffle some elements

Open quancs opened this issue 3 years ago • 3 comments

🐛 Describe the bug

d1 and d2 have different length, d3 is a zipper contains them.

import torchdata.datapipes as dp
d1 = dp.map.SequenceWrapper(['0', '1', '2', '3'])
d1 = dp.map.Shuffler(d1)

d2 = dp.map.SequenceWrapper(['a', 'b', 'c', 'd', 'e', 'f'])
d2 = dp.map.Shuffler(d2)

d3 = dp.map.Zipper(d2, d1)
from torch.utils.data import DataLoader
dl = DataLoader(d3, batch_size=None, num_workers=1, shuffle=True)

for i in range(10):
    o = []
    for x in dl:
        o.append(x)
    print(o)

The results:

[['f', '2'], ['a', '3'], ['e', '0'], ['c', '1']]
[['e', '0'], ['c', '1'], ['f', '2'], ['a', '3']]
[['c', '1'], ['a', '3'], ['f', '2'], ['e', '0']]
[['e', '0'], ['a', '3'], ['c', '1'], ['f', '2']]
[['e', '0'], ['c', '1'], ['f', '2'], ['a', '3']]
[['c', '1'], ['e', '0'], ['f', '2'], ['a', '3']]
[['a', '3'], ['e', '0'], ['f', '2'], ['c', '1']]
[['a', '3'], ['c', '1'], ['e', '0'], ['f', '2']]
[['c', '1'], ['e', '0'], ['f', '2'], ['a', '3']]
[['e', '0'], ['c', '1'], ['a', '3'], ['f', '2']]

As we can see, the results of 10 runs only contain partial elements of d2.

Versions

Collecting environment information... PyTorch version: 1.12.1 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31

Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 10.1.243 GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB GPU 1: NVIDIA A100-SXM4-80GB GPU 2: NVIDIA A100-SXM4-80GB GPU 3: NVIDIA A100-SXM4-80GB GPU 4: NVIDIA A100-SXM4-80GB GPU 5: NVIDIA A100-SXM4-80GB GPU 6: NVIDIA A100-SXM4-80GB GPU 7: NVIDIA A100-SXM4-80GB

Nvidia driver version: 510.85.02 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] mypy==0.971 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.4 [pip3] pytorch-lightning==1.7.3 [pip3] pytorch-ranger==0.1.1 [pip3] torch==1.12.1 [pip3] torch-complex==0.4.3 [pip3] torch-optimizer==0.3.0 [pip3] torch-stoi==0.1.2 [pip3] torchaudio==0.12.1 [pip3] torchdata==0.4.1 [pip3] torchmetrics==0.9.3 [pip3] torchvision==0.13.1 [conda] blas 1.0 mkl
[conda] cudatoolkit 11.6.0 hecad31d_10 conda-forge [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7e14d7c_0 conda-forge [conda] mkl_fft 1.3.1 py39h0c7bc48_1 conda-forge [conda] mkl_random 1.2.2 py39hde0f152_0 conda-forge [conda] numpy 1.22.4 pypi_0 pypi [conda] pytorch 1.12.1 py3.9_cuda11.6_cudnn8.3.2_0 pytorch [conda] pytorch-lightning 1.7.3 pypi_0 pypi [conda] pytorch-mutex 1.0 cuda pytorch [conda] pytorch-ranger 0.1.1 pypi_0 pypi [conda] torch-complex 0.4.3 pypi_0 pypi [conda] torch-optimizer 0.3.0 pypi_0 pypi [conda] torch-stoi 0.1.2 pypi_0 pypi [conda] torchaudio 0.12.1 py39_cu116 pytorch [conda] torchdata 0.4.1 pypi_0 pypi [conda] torchmetrics 0.9.3 pypi_0 pypi [conda] torchvision 0.13.1 py39_cu116 pytorch

quancs avatar Sep 08 '22 11:09 quancs

Iterable Shuffler with Zipper for datapipes with different length works right. Seems it's only the problem of map Shuffler and Zipper

quancs avatar Sep 08 '22 12:09 quancs

Thank you for asking about it. I am currently working on a PR to enable proper shuffling for MapDataPipe. The above behavior is the map.shuffle is not shuffled per epoch.

https://github.com/pytorch/pytorch/pull/83202 is landed to make sure Shuffler is properly shuffled per epoch. And, I am still working on https://github.com/pytorch/pytorch/pull/82975 to make MapDataPipe being seeded properly by DataLoader.

Will post when the PR is landed then you can test it with the nightly releases.

ejguan avatar Sep 08 '22 13:09 ejguan

Great! Thank you.

quancs avatar Sep 08 '22 15:09 quancs