MNIST example code in README has same RHS for different variables
Hello and thanks for a very interesting repo! The top-level README provides helpful step-by-step instructions for obtaining the data corrected in this repo.
For MNIST, the instructions include two variables with assignments that share the same right-hand side:
test_data = datasets.MNIST(data_dir, train=False, download=True).test_labels.numpy()
test_labels = datasets.MNIST(data_dir, train=False, download=True).test_labels.numpy()
They're the same in Python:
In [6]: np.all(np.equal(test_data, test_labels))
Out[6]: True
It looks like the test_labels on the right-hand side should be test_data for the first assignment.
(There are warnings from torchvision 0.13.0 about the names changing, but whichever torchvision version is supported by the step-by-step tutorial in the README, it would help to be consistent.)
The two assignments for training data appear to have a similar problem:
bash$ sed 's!.*=!!' | while read rhs; do echo $rhs | openssl sha256; done
train_data = datasets.MNIST(data_dir, train=True, download=True).test_data.numpy()
train_labels = datasets.MNIST(data_dir, train=True, download=True).test_data.numpy()
870562877997826fd9627b9eb3890323171ea41841499caec4c8ea1ccddfeea4
870562877997826fd9627b9eb3890323171ea41841499caec4c8ea1ccddfeea4
bash$
Good catch! Would you be interested in submitting a PR to fix this?
Yes, I can try to submit a PR after work. I'll do the minimal change and ignore the torchvision warnings if that works.
.../venv/lib/python3.9/site-packages/torchvision/datasets/mnist.py:80: UserWarning: test_data has been renamed data
t_data has been renamed data")
.../venv/lib/python3.9/site-packages/torchvision/datasets/mnist.py:70: UserWarning: test_labels has been renamed targets
warnings.warn("test_labels has been renamed targets")
Fixed in https://github.com/cleanlab/label-errors/commit/27d5291ae292ebea9e591e5054615f3457d0ad21.