label-errors icon indicating copy to clipboard operation
label-errors copied to clipboard

MNIST example code in README has same RHS for different variables

Open ecashin opened this issue 3 years ago • 2 comments

Hello and thanks for a very interesting repo! The top-level README provides helpful step-by-step instructions for obtaining the data corrected in this repo.

For MNIST, the instructions include two variables with assignments that share the same right-hand side:

test_data = datasets.MNIST(data_dir, train=False, download=True).test_labels.numpy()
test_labels = datasets.MNIST(data_dir, train=False, download=True).test_labels.numpy()

They're the same in Python:

In [6]: np.all(np.equal(test_data, test_labels))
Out[6]: True

It looks like the test_labels on the right-hand side should be test_data for the first assignment.

(There are warnings from torchvision 0.13.0 about the names changing, but whichever torchvision version is supported by the step-by-step tutorial in the README, it would help to be consistent.)

The two assignments for training data appear to have a similar problem:

bash$ sed 's!.*=!!' | while read rhs; do echo $rhs | openssl sha256; done
train_data = datasets.MNIST(data_dir, train=True, download=True).test_data.numpy()
train_labels = datasets.MNIST(data_dir, train=True, download=True).test_data.numpy()
870562877997826fd9627b9eb3890323171ea41841499caec4c8ea1ccddfeea4
870562877997826fd9627b9eb3890323171ea41841499caec4c8ea1ccddfeea4
bash$ 

ecashin avatar Aug 04 '22 14:08 ecashin

Good catch! Would you be interested in submitting a PR to fix this?

anishathalye avatar Aug 04 '22 14:08 anishathalye

Yes, I can try to submit a PR after work. I'll do the minimal change and ignore the torchvision warnings if that works.

.../venv/lib/python3.9/site-packages/torchvision/datasets/mnist.py:80: UserWarning: test_data has been renamed data
t_data has been renamed data")
.../venv/lib/python3.9/site-packages/torchvision/datasets/mnist.py:70: UserWarning: test_labels has been renamed targets
  warnings.warn("test_labels has been renamed targets")

ecashin avatar Aug 04 '22 14:08 ecashin

Fixed in https://github.com/cleanlab/label-errors/commit/27d5291ae292ebea9e591e5054615f3457d0ad21.

anishathalye avatar Aug 27 '22 00:08 anishathalye