Code NOT WORKING

# load dataset module 
import tensorflow_datasets as tfds
# make downloading progress bar dissable 
tfds.disable_progress_bar()
# download data - cats vs dogs 
_=tfds.load('cats_vs_dogs',            # dataset name 
            as_supervised=False,       # include labels - False
          )

ERROR: -> DownloadError: Failed to get url https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip. HTTP code: 404.

Environment - Google Colab

May 10 '22 01:05 MegaCreater

yes, I get same error

May 10 '22 15:05 renwenlong-github

getting this error too. I've been using the archived version to download manually

May 10 '22 21:05 salray99

@salray99 how do you use the archived version? I'm getting the same error too -- I think the microsoft link has moved.

May 11 '22 13:05 plancherb1

I see the same dataset here : https://www.microsoft.com/en-us/download/details.aspx?id=54765 you could download to your local environment or upload to wherever the hosted environment you are running your code ?

May 12 '22 00:05 chikakorooney

Even I got the Error..... But I think we can install it directly from the link given by @chikakorooney and then import it into Google Colab or Visual Studio (Whatever is the environment)......... It worked for me that way, but I had to convert the folder into one manually fed local database, which resets every time....... So is there a easier method or do I have to wait for Tensorflow to fix the link?

May 12 '22 06:05 IHackmer19

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")

May 12 '22 08:05 GuillaumeVray

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")

Thanks Worked for me...........

May 12 '22 09:05 IHackmer19

coding=utf-8

Copyright 2022 The TensorFlow Datasets Authors.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

"""Cats vs Dogs dataset."""

import re

from absl import logging import tensorflow as tf import tensorflow_datasets.public_api as tfds

_CITATION = """
@Inproceedings (Conference){asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization, author = {Elson, Jeremy and Douceur, John (JD) and Howell, Jon and Saul, Jared}, title = {Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization}, booktitle = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)}, year = {2007}, month = {October}, publisher = {Association for Computing Machinery, Inc.}, url = {https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/}, edition = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)}, } """

_URL = ("https://download.microsoft.com/download/3/E/1/3E1C3F21-" "ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip") #https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip _NUM_CORRUPT_IMAGES = 1738 _DESCRIPTION = (("A large set of images of cats and dogs. " "There are %d corrupted images that are dropped.") % _NUM_CORRUPT_IMAGES)

_NAME_RE = re.compile(r"^PetImages\/[\/]\d+.jpg$")

class CatsVsDogs(tfds.core.GeneratorBasedBuilder): """Cats vs Dogs."""

VERSION = tfds.core.Version("4.0.0") RELEASE_NOTES = { "4.0.0": "New split API (https://tensorflow.org/datasets/splits)", }

def _info(self): return tfds.core.DatasetInfo( builder=self, description=_DESCRIPTION, features=tfds.features.FeaturesDict({ "image": tfds.features.Image(), "image/filename": tfds.features.Text(), # eg 'PetImages/Dog/0.jpg' "label": tfds.features.ClassLabel(names=["cat", "dog"]), }), supervised_keys=("image", "label"), homepage="https://www.microsoft.com/en-us/download/details.aspx?id=54765", citation=_CITATION, )

def _split_generators(self, dl_manager): path = dl_manager.download(_URL)

# There is no predefined train/val/test split for this dataset.
return [
    tfds.core.SplitGenerator(
        name=tfds.Split.TRAIN,
        gen_kwargs={
            "archive": dl_manager.iter_archive(path),
        }),
]

def _generate_examples(self, archive): """Generate Cats vs Dogs images and labels given a directory path.""" num_skipped = 0 for fname, fobj in archive: res = _NAME_RE.match(fname) if not res: # README file, ... continue label = res.group(1).lower() if tf.compat.as_bytes("JFIF") not in fobj.peek(10): num_skipped += 1 continue record = { "image": fobj, "image/filename": fname, "label": label, } yield fname, record

if num_skipped != _NUM_CORRUPT_IMAGES:
  raise ValueError("Expected %d corrupt images, but found %d" %
                   (_NUM_CORRUPT_IMAGES, num_skipped))
logging.warning("%d images were corrupted and were skipped", num_skipped)

May 13 '22 09:05 realover82

all copy and paste

May 13 '22 09:05 realover82

https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image_classification/cats_vs_dogs.py

IT IS GOOD

May 13 '22 09:05 realover82

Thank you @MegaCreater for raising the issue, and to all contributors for their comments.

PR#3923 should have solved the issue!

May 16 '22 10:05 ccl-core

Hi. I have a temporary solution below to modify the URL: setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")

Thanks Worked for me........... f

Jun 05 '22 14:06 wbm26

Hi. I have a temporary solution below to modify the URL:

setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")

Why I have this error: module 'tensorflow_datasets' has no attribute 'image_classification'

Sep 29 '22 23:09 lingxiaoW

Hi @lingxiaoW , is tfds.load("cats_vs_dogs") not working for you?

Sep 30 '22 08:09 ccl-core

I am also getting an error like this one, I looked everywhere, in case anyone knows the resolution, please help.

When I tried this code:

data = tfds.load('cats_vs_dogs', as_supervised=True)

I also tried : setattr command as well earlier suggested as well, but it is not working too. Using windows10 and tfds version: '4.9.3+nightly' and tensorflow: '2.12.0'

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\XYZ.ABC\tensorflow_datasets\cats_vs_dogs\4.0.1... Dl Completed...: 100% 1/1 [00:00<00:00, 18.50 url/s] Dl Size...: 100% 824887076/824887076 [00:00<00:00, 21797207565.24 MiB/s] Generating splits...: 0% 0/1 [00:00<?, ? splits/s]

KeyError Traceback (most recent call last) Cell In [3], line 1 ----> 1 data = tfds.load('cats_vs_dogs', as_supervised=True)

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\logging_init_.py:168, in _FunctionDecorator.call(self, function, instance, args, kwargs) 166 metadata = self._start_call() 167 try: --> 168 return function(*args, **kwargs) 169 except Exception: 170 metadata.mark_error()

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\load.py:640, in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs) 521 """Loads the named dataset into a tf.data.Dataset. 522 523 tfds.load is a convenience method that: (...) 632 Split-specific information is available in ds_info.splits. 633 """ 634 dbuilder = _fetch_builder( 635 name, 636 data_dir, 637 builder_kwargs, 638 try_gcs, 639 ) --> 640 _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs) 642 if as_dataset_kwargs is None: 643 as_dataset_kwargs = {}

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\load.py:499, in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs) 497 if download: 498 download_and_prepare_kwargs = download_and_prepare_kwargs or {} --> 499 dbuilder.download_and_prepare(**download_and_prepare_kwargs)

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\logging_init_.py:168, in _FunctionDecorator.call(self, function, instance, args, kwargs) 166 metadata = self._start_call() 167 try: --> 168 return function(*args, **kwargs) 169 except Exception: 170 metadata.mark_error()

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py:691, in DatasetBuilder.download_and_prepare(self, download_dir, download_config, file_format) 689 self.info.read_from_directory(self.data_dir) 690 else: --> 691 self._download_and_prepare( 692 dl_manager=dl_manager, 693 download_config=download_config, 694 ) 696 # NOTE: If modifying the lines below to put additional information in 697 # DatasetInfo, you'll likely also want to update 698 # DatasetInfo.read_from_directory to possibly restore these attributes 699 # when reading from package data. 700 self.info.download_size = dl_manager.downloaded_size

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py:1584, in GeneratorBasedBuilder._download_and_prepare(self, dl_manager, download_config) 1572 for split_name, generator in utils.tqdm( 1573 split_generators.items(), 1574 desc="Generating splits...", 1575 unit=" splits", 1576 leave=False, 1577 ): 1578 filename_template = naming.ShardedFileTemplate( 1579 split=split_name, 1580 dataset_name=self.name, 1581 data_dir=self.data_path, 1582 filetype_suffix=path_suffix, 1583 ) -> 1584 future = split_builder.submit_split_generation( 1585 split_name=split_name, 1586 generator=generator, 1587 filename_template=filename_template, 1588 disable_shuffling=self.info.disable_shuffling, 1589 ) 1590 split_info_futures.append(future) 1592 # Process the result of the beam pipeline.

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\split_builder.py:341, in SplitBuilder.submit_split_generation(self, split_name, generator, filename_template, disable_shuffling) 338 # Depending on the type of generator, we use the corresponding 339 # _build_from_xyz method. 340 if isinstance(generator, collections.abc.Iterable): --> 341 return self._build_from_generator(**build_kwargs) 342 else: # Otherwise, beam required 343 unknown_generator_type = TypeError( 344 f'Invalid split generator value for split {split_name}. ' 345 'Expected generator or apache_beam object. Got: ' 346 f'{type(generator)}' 347 )

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\split_builder.py:406, in SplitBuilder._build_from_generator(self, split_name, generator, filename_template, disable_shuffling) 396 serialized_info = self._features.get_serialized_info() 397 writer = writer_lib.Writer( 398 serializer=example_serializer.ExampleSerializer(serialized_info), 399 filename_template=filename_template, (...) 404 shard_config=self._shard_config, 405 ) --> 406 for key, example in utils.tqdm( 407 generator, 408 desc=f'Generating {split_name} examples...', 409 unit=' examples', 410 total=total_num_examples, 411 leave=False, 412 mininterval=1.0, 413 ): 414 try: 415 example = self._features.encode_example(example)

File C:\env310\py311\Lib\site-packages\tqdm\notebook.py:259, in tqdm_notebook.iter(self) 257 try: 258 it = super(tqdm_notebook, self).iter() --> 259 for obj in it: 260 # return super(tqdm...) will not catch exception 261 yield obj 262 # NB: except ... [ as ...] breaks IPython async KeyboardInterrupt

File C:\env310\py311\Lib\site-packages\tqdm\std.py:1195, in tqdm.iter(self) 1192 time = self._time 1194 try: -> 1195 for obj in iterable: 1196 yield obj 1197 # Update and possibly print the progressbar. 1198 # Note: does not call self.update(1) for speed optimisation.

File C:\env310\py311\Lib\site-packages\tensorflow_datasets\image_classification\cats_vs_dogs.py:117, in CatsVsDogs._generate_examples(self, archive) 115 with zipfile.ZipFile(buffer, "w") as new_zip: 116 new_zip.writestr(fname, img_recoded.numpy()) --> 117 new_fobj = zipfile.ZipFile(buffer).open(fname) 119 record = { 120 "image": new_fobj, 121 "image/filename": fname, 122 "label": label, 123 } 124 yield fname, record

File ~\AppData\Local\Programs\Python\Python311\Lib\zipfile.py:1544, in ZipFile.open(self, name, mode, pwd, force_zip64) 1541 zinfo._compresslevel = self.compresslevel 1542 else: 1543 # Get info object for name -> 1544 zinfo = self.getinfo(name) 1546 if mode == 'w': 1547 return self._open_to_write(zinfo, force_zip64=force_zip64)

File ~\AppData\Local\Programs\Python\Python311\Lib\zipfile.py:1473, in ZipFile.getinfo(self, name) 1471 info = self.NameToInfo.get(name) 1472 if info is None: -> 1473 raise KeyError( 1474 'There is no item named %r in the archive' % name) 1476 return info

KeyError: "There is no item named 'PetImages\\Cat\\0.jpg' in the archive"

Nov 22 '23 04:11 pradeep6kumar

Regarding the last comment, I was getting the same issue. After some poking about, this looks like a problem with the method _generate_examples() on tensor_dataflow.image_classification.cats_vs_dogs.CatsVsDogs. In that method, the following line...

new_fobj = zipfile.ZipFile(buffer).open(fname)

...is causing the exception. The problem is with fname. Once written into the in-memory ZipFile a few lines prior, the path separator may end up being different in the in-memory ZipFile buffer than in fname variable itself, leading to the KeyError exception 'there is no item named some\path\or\other.ext in the archive'.

I managed to hack my way past it by replacing the _generate_examples method with one I generated on-the-fly that replaced the line above with...

new_fobj = zipfile.ZipFile(buffer).open(fname.replace('\', '/'))

...but the fix that needs to be pulled into the repository would need to be a bit hardier than that.

Jan 12 '24 23:01 vanisle-nanaimo

I am using Windows 10 with VS code and running on a virtual environment. when i use the tfds.load(cats_vs_dogs, as_supervised=True) after downloading just before it starts splitting it crashes and gives this error. please does anyone know how to fix this.? I'll be happy

Traceback (most recent call last): File "c:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\tessssss.py", line 28, in train_data = tfds.load('cats_vs_dogs', split='train[:80%]', as_supervised=True) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\logging_init_.py", line 168, in call return function(*args, **kwargs) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\load.py", line 649, in load _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\load.py", line 508, in download_and_prepare_builder dbuilder.download_and_prepare(**download_and_prepare_kwargs) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\logging_init.py", line 168, in call return function(*args, **kwargs) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 691, in download_and_prepare self._download_and_prepare( File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1584, in _download_and_prepare future = split_builder.submit_split_generation( File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 341, in submit_split_generation return self._build_from_generator(**build_kwargs) File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 406, in _build_from_generator for key, example in utils.tqdm( File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tqdm\std.py", line 1182, in iter for obj in iterable: File "C:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\convolution_tensorflow\lib\site-packages\tensorflow_datasets\image_classification\cats_vs_dogs.py", line 117, in _generate_examples new_fobj = zipfile.ZipFile(buffer).open(fname) File "C:\Python310\lib\zipfile.py", line 1514, in open zinfo = self.getinfo(name) File "C:\Python310\lib\zipfile.py", line 1441, in getinfo raise KeyError( KeyError: "There is no item named 'PetImages\\Cat\\0.jpg' in the archive"

Jan 15 '24 22:01 Nwankwo-Nnaemeka

Regarding the last comment, I was getting the same issue. After some poking about, this looks like a problem with the method _generate_examples() on tensor_dataflow.image_classification.cats_vs_dogs.CatsVsDogs. In that method, the following line...

new_fobj = zipfile.ZipFile(buffer).open(fname)

...is causing the exception. The problem is with fname. Once written into the in-memory ZipFile a few lines prior, the path separator may end up being different in the in-memory ZipFile buffer than in fname variable itself, leading to the KeyError exception 'there is no item named some\path\or\other.ext in the archive'.

I managed to hack my way past it by replacing the _generate_examples method with one I generated on-the-fly that replaced the line above with...

new_fobj = zipfile.ZipFile(buffer).open(fname.replace('', '/'))

...but the fix that needs to be pulled into the repository would need to be a bit hardier than that.

please explain how you were able to fix this I will really appreciate. thank you https://github.com/tensorflow/datasets/issues/3918#issuecomment-1890147832

Jan 15 '24 22:01 Nwankwo-Nnaemeka

Here's the code I used to get past the issue. The line I changed is prefaced by a comment that says HACKY FIX.

import tensorflow as tf import tensorflow_datasets as tfds import io import zipfile import logging

def __generate_examples(self, archive): num_skipped = 0 for fname, fobj in archive: res = tfds.image_classification.cats_vs_dogs._NAME_RE.match(fname) if not res: # README file, ... continue label = res.group(1).lower() if tf.compat.as_bytes("JFIF") not in fobj.peek(10): num_skipped += 1 continue

  img_data = fobj.read()
  img_tensor = tf.image.decode_image(img_data)
  img_recoded = tf.io.encode_jpeg(img_tensor)

  # Converting the recoded image back into a zip file container.
  buffer = io.BytesIO()
  with zipfile.ZipFile(buffer, "w") as new_zip:
    new_zip.writestr(fname, img_recoded.numpy())
  buffer.seek(0)
  # HACKY FIX
  new_fobj = zipfile.ZipFile(buffer).open(fname.replace('\\', '/'))

  record = {
      "image": new_fobj,
      "image/filename": fname,
      "label": label,
  }
  yield fname, record

if num_skipped != tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES:
  raise ValueError(
      "Expected %d corrupt images, but found %d"
      % (tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES, num_skipped)
  )
logging.warning("%d images were corrupted and were skipped", num_skipped)

tfds.image_classification.cats_vs_dogs.CatsVsDogs._generate_examples = __generate_examples data, metadata = tfds.load('cats_vs_dogs', as_supervised=True, with_info=True)

Jan 15 '24 22:01 vanisle-nanaimo

did you also add this "buffer.seek(0)" before the hacky fix line as well because from the source code here it is not there. Thank You. https://github.com/tensorflow/datasets/issues/3918#issuecomment-1892835410

Jan 16 '24 07:01 Nwankwo-Nnaemeka

I did not add buffer.seek(0).

Jan 16 '24 15:01 vanisle-nanaimo

Thank you. after using that it loaded in the data but it shows me a TypeError : 'NoneType' object is not subcriptable what could be the cause. https://github.com/tensorflow/datasets/issues/3918#issuecomment-1893946982 thank you so much for your help. I don't think it generates the split `import tensorflow_datasets as tfds from utils import _generate_examples tfds.image_classification.cats_vs_dogs.CatsVsDogs._generate_examples = _generate_examples

train_data = tfds.load('cats_vs_dogs', split='train[:80%]', data_dir='Datasets/training_dir', as_supervised=True) test_data = tfds.load('cats_vs_dogs', split='train[80%:90%]', data_dir='Datasets/test_dir', as_supervised=True) validation_data= tfds.load('cats_vs_dogs', split='train[-10%:]', data_dir='Datasets/validation_dir', as_supervised=True)`

here is my code

Jan 16 '24 17:01 Nwankwo-Nnaemeka

I didn't attempt this with the split parameter, so I can't comment on that.

Jan 17 '24 01:01 vanisle-nanaimo

People are still facing this issue in the tensorflow/tensorflow repo issues/84104. @ccl-core Can you please reopen the issue?

I believe this has to do with the difference in file paths in Windows vs Unix/Linux. I'll send out a CL to fix this.

Jan 27 '25 06:01 SanjaySG

A fix for this issue has been submitted through: https://github.com/tensorflow/datasets/commit/9969ce542f4b0e1cbf0a085e8e0df11bccea5c17.

Jan 28 '25 16:01 SanjaySG

In general, I solved this problem in the following way: Windows 10 Pro/ 20H2. Python 3.12.6 tensorflow_datasets 4.9.7

Create Jupyter Notebook file - file_name.ipynb

import tensorflow_datasets as tfds datos, metadatos = tfds.load('cats_vs_dogs', as_supervised=True, with_info=True) tfds.as_dataframe(datos['train'].take(5), metadatos)

I assembled the project on the site - https://colab.research.google.com I checked what is created after unpacking the archive in a folder on - https://colab.research.google.com I downloaded these files to the folder - C:\Users\user_!!!\tensorflow_datasets\cats_vs_dogs\4.0.1 user_!!! - Your Windows username It worked for me.

Archive with files - https://drive.google.com/file/d/1HhDgpoBy5tJ_UGOt4hCSrNEr_6Zv-lMm/view?usp=sharing

Feb 24 '25 13:02 valik803

tensorflow_datasets.load('cats_vs_dogs') not working !

Code NOT WORKING

coding=utf-8

Copyright 2022 The TensorFlow Datasets Authors.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.