tensorflow_datasets.load('cats_vs_dogs') not working !
Code NOT WORKING
# load dataset module
import tensorflow_datasets as tfds
# make downloading progress bar dissable
tfds.disable_progress_bar()
# download data - cats vs dogs
_=tfds.load('cats_vs_dogs', # dataset name
as_supervised=False, # include labels - False
)
ERROR: -> DownloadError: Failed to get url https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip. HTTP code: 404.
Environment - Google Colab
yes, I get same error
getting this error too. I've been using the archived version to download manually
@salray99 how do you use the archived version? I'm getting the same error too -- I think the microsoft link has moved.
I see the same dataset here : https://www.microsoft.com/en-us/download/details.aspx?id=54765 you could download to your local environment or upload to wherever the hosted environment you are running your code ?
Even I got the Error..... But I think we can install it directly from the link given by @chikakorooney and then import it into Google Colab or Visual Studio (Whatever is the environment)......... It worked for me that way, but I had to convert the folder into one manually fed local database, which resets every time....... So is there a easier method or do I have to wait for Tensorflow to fix the link?
Hi. I have a temporary solution below to modify the URL:
setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")
Hi. I have a temporary solution below to modify the URL:
setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")
Thanks Worked for me...........
coding=utf-8
Copyright 2022 The TensorFlow Datasets Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
"""Cats vs Dogs dataset."""
import re
from absl import logging import tensorflow as tf import tensorflow_datasets.public_api as tfds
_CITATION = """
@Inproceedings (Conference){asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization,
author = {Elson, Jeremy and Douceur, John (JD) and Howell, Jon and Saul, Jared},
title = {Asirra: A CAPTCHA that Exploits Interest-Aligned Manual Image Categorization},
booktitle = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)},
year = {2007},
month = {October},
publisher = {Association for Computing Machinery, Inc.},
url = {https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/},
edition = {Proceedings of 14th ACM Conference on Computer and Communications Security (CCS)},
}
"""
_URL = ("https://download.microsoft.com/download/3/E/1/3E1C3F21-" "ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip") #https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip _NUM_CORRUPT_IMAGES = 1738 _DESCRIPTION = (("A large set of images of cats and dogs. " "There are %d corrupted images that are dropped.") % _NUM_CORRUPT_IMAGES)
_NAME_RE = re.compile(r"^PetImages\/[\/]\d+.jpg$")
class CatsVsDogs(tfds.core.GeneratorBasedBuilder): """Cats vs Dogs."""
VERSION = tfds.core.Version("4.0.0") RELEASE_NOTES = { "4.0.0": "New split API (https://tensorflow.org/datasets/splits)", }
def _info(self): return tfds.core.DatasetInfo( builder=self, description=_DESCRIPTION, features=tfds.features.FeaturesDict({ "image": tfds.features.Image(), "image/filename": tfds.features.Text(), # eg 'PetImages/Dog/0.jpg' "label": tfds.features.ClassLabel(names=["cat", "dog"]), }), supervised_keys=("image", "label"), homepage="https://www.microsoft.com/en-us/download/details.aspx?id=54765", citation=_CITATION, )
def _split_generators(self, dl_manager): path = dl_manager.download(_URL)
# There is no predefined train/val/test split for this dataset.
return [
tfds.core.SplitGenerator(
name=tfds.Split.TRAIN,
gen_kwargs={
"archive": dl_manager.iter_archive(path),
}),
]
def _generate_examples(self, archive): """Generate Cats vs Dogs images and labels given a directory path.""" num_skipped = 0 for fname, fobj in archive: res = _NAME_RE.match(fname) if not res: # README file, ... continue label = res.group(1).lower() if tf.compat.as_bytes("JFIF") not in fobj.peek(10): num_skipped += 1 continue record = { "image": fobj, "image/filename": fname, "label": label, } yield fname, record
if num_skipped != _NUM_CORRUPT_IMAGES:
raise ValueError("Expected %d corrupt images, but found %d" %
(_NUM_CORRUPT_IMAGES, num_skipped))
logging.warning("%d images were corrupted and were skipped", num_skipped)
all copy and paste
https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image_classification/cats_vs_dogs.py
IT IS GOOD
Thank you @MegaCreater for raising the issue, and to all contributors for their comments.
PR#3923 should have solved the issue!
Hi. I have a temporary solution below to modify the URL: setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")
Thanks Worked for me........... f
Hi. I have a temporary solution below to modify the URL:
setattr(tfds.image_classification.cats_vs_dogs, '_URL',"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_5340.zip")
Why I have this error: module 'tensorflow_datasets' has no attribute 'image_classification'
Hi @lingxiaoW , is tfds.load("cats_vs_dogs") not working for you?
I am also getting an error like this one, I looked everywhere, in case anyone knows the resolution, please help.
When I tried this code:
data = tfds.load('cats_vs_dogs', as_supervised=True)
I also tried : setattr command as well earlier suggested as well, but it is not working too. Using windows10 and tfds version: '4.9.3+nightly' and tensorflow: '2.12.0'
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\XYZ.ABC\tensorflow_datasets\cats_vs_dogs\4.0.1... Dl Completed...: 100% 1/1 [00:00<00:00, 18.50 url/s] Dl Size...: 100% 824887076/824887076 [00:00<00:00, 21797207565.24 MiB/s] Generating splits...: 0% 0/1 [00:00<?, ? splits/s]
KeyError Traceback (most recent call last) Cell In [3], line 1 ----> 1 data = tfds.load('cats_vs_dogs', as_supervised=True)
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\logging_init_.py:168, in _FunctionDecorator.call(self, function, instance, args, kwargs) 166 metadata = self._start_call() 167 try: --> 168 return function(*args, **kwargs) 169 except Exception: 170 metadata.mark_error()
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\load.py:640, in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
521 """Loads the named dataset into a tf.data.Dataset.
522
523 tfds.load is a convenience method that:
(...)
632 Split-specific information is available in ds_info.splits.
633 """
634 dbuilder = _fetch_builder(
635 name,
636 data_dir,
637 builder_kwargs,
638 try_gcs,
639 )
--> 640 _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
642 if as_dataset_kwargs is None:
643 as_dataset_kwargs = {}
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\load.py:499, in _download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs) 497 if download: 498 download_and_prepare_kwargs = download_and_prepare_kwargs or {} --> 499 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\logging_init_.py:168, in _FunctionDecorator.call(self, function, instance, args, kwargs) 166 metadata = self._start_call() 167 try: --> 168 return function(*args, **kwargs) 169 except Exception: 170 metadata.mark_error()
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py:691, in DatasetBuilder.download_and_prepare(self, download_dir, download_config, file_format) 689 self.info.read_from_directory(self.data_dir) 690 else: --> 691 self._download_and_prepare( 692 dl_manager=dl_manager, 693 download_config=download_config, 694 ) 696 # NOTE: If modifying the lines below to put additional information in 697 # DatasetInfo, you'll likely also want to update 698 # DatasetInfo.read_from_directory to possibly restore these attributes 699 # when reading from package data. 700 self.info.download_size = dl_manager.downloaded_size
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\dataset_builder.py:1584, in GeneratorBasedBuilder._download_and_prepare(self, dl_manager, download_config) 1572 for split_name, generator in utils.tqdm( 1573 split_generators.items(), 1574 desc="Generating splits...", 1575 unit=" splits", 1576 leave=False, 1577 ): 1578 filename_template = naming.ShardedFileTemplate( 1579 split=split_name, 1580 dataset_name=self.name, 1581 data_dir=self.data_path, 1582 filetype_suffix=path_suffix, 1583 ) -> 1584 future = split_builder.submit_split_generation( 1585 split_name=split_name, 1586 generator=generator, 1587 filename_template=filename_template, 1588 disable_shuffling=self.info.disable_shuffling, 1589 ) 1590 split_info_futures.append(future) 1592 # Process the result of the beam pipeline.
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\split_builder.py:341, in SplitBuilder.submit_split_generation(self, split_name, generator, filename_template, disable_shuffling)
338 # Depending on the type of generator, we use the corresponding
339 # _build_from_xyz method.
340 if isinstance(generator, collections.abc.Iterable):
--> 341 return self._build_from_generator(**build_kwargs)
342 else: # Otherwise, beam required
343 unknown_generator_type = TypeError(
344 f'Invalid split generator value for split {split_name}. '
345 'Expected generator or apache_beam object. Got: '
346 f'{type(generator)}'
347 )
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\core\split_builder.py:406, in SplitBuilder._build_from_generator(self, split_name, generator, filename_template, disable_shuffling) 396 serialized_info = self._features.get_serialized_info() 397 writer = writer_lib.Writer( 398 serializer=example_serializer.ExampleSerializer(serialized_info), 399 filename_template=filename_template, (...) 404 shard_config=self._shard_config, 405 ) --> 406 for key, example in utils.tqdm( 407 generator, 408 desc=f'Generating {split_name} examples...', 409 unit=' examples', 410 total=total_num_examples, 411 leave=False, 412 mininterval=1.0, 413 ): 414 try: 415 example = self._features.encode_example(example)
File C:\env310\py311\Lib\site-packages\tqdm\notebook.py:259, in tqdm_notebook.iter(self) 257 try: 258 it = super(tqdm_notebook, self).iter() --> 259 for obj in it: 260 # return super(tqdm...) will not catch exception 261 yield obj 262 # NB: except ... [ as ...] breaks IPython async KeyboardInterrupt
File C:\env310\py311\Lib\site-packages\tqdm\std.py:1195, in tqdm.iter(self) 1192 time = self._time 1194 try: -> 1195 for obj in iterable: 1196 yield obj 1197 # Update and possibly print the progressbar. 1198 # Note: does not call self.update(1) for speed optimisation.
File C:\env310\py311\Lib\site-packages\tensorflow_datasets\image_classification\cats_vs_dogs.py:117, in CatsVsDogs._generate_examples(self, archive) 115 with zipfile.ZipFile(buffer, "w") as new_zip: 116 new_zip.writestr(fname, img_recoded.numpy()) --> 117 new_fobj = zipfile.ZipFile(buffer).open(fname) 119 record = { 120 "image": new_fobj, 121 "image/filename": fname, 122 "label": label, 123 } 124 yield fname, record
File ~\AppData\Local\Programs\Python\Python311\Lib\zipfile.py:1544, in ZipFile.open(self, name, mode, pwd, force_zip64) 1541 zinfo._compresslevel = self.compresslevel 1542 else: 1543 # Get info object for name -> 1544 zinfo = self.getinfo(name) 1546 if mode == 'w': 1547 return self._open_to_write(zinfo, force_zip64=force_zip64)
File ~\AppData\Local\Programs\Python\Python311\Lib\zipfile.py:1473, in ZipFile.getinfo(self, name) 1471 info = self.NameToInfo.get(name) 1472 if info is None: -> 1473 raise KeyError( 1474 'There is no item named %r in the archive' % name) 1476 return info
KeyError: "There is no item named 'PetImages\\Cat\\0.jpg' in the archive"
Regarding the last comment, I was getting the same issue. After some poking about, this looks like a problem with the method _generate_examples() on tensor_dataflow.image_classification.cats_vs_dogs.CatsVsDogs. In that method, the following line...
new_fobj = zipfile.ZipFile(buffer).open(fname)
...is causing the exception. The problem is with fname. Once written into the in-memory ZipFile a few lines prior, the path separator may end up being different in the in-memory ZipFile buffer than in fname variable itself, leading to the KeyError exception 'there is no item named some\path\or\other.ext in the archive'.
I managed to hack my way past it by replacing the _generate_examples method with one I generated on-the-fly that replaced the line above with...
new_fobj = zipfile.ZipFile(buffer).open(fname.replace('\', '/'))
...but the fix that needs to be pulled into the repository would need to be a bit hardier than that.
I am using Windows 10 with VS code and running on a virtual environment. when i use the tfds.load(cats_vs_dogs, as_supervised=True) after downloading just before it starts splitting it crashes and gives this error. please does anyone know how to fix this.? I'll be happy
Traceback (most recent call last):
File "c:\Users\Windows 10\Documents\MyProjects\ForConvolutionsTensorflow\tessssss.py", line 28, in
Regarding the last comment, I was getting the same issue. After some poking about, this looks like a problem with the method _generate_examples() on tensor_dataflow.image_classification.cats_vs_dogs.CatsVsDogs. In that method, the following line...
new_fobj = zipfile.ZipFile(buffer).open(fname)
...is causing the exception. The problem is with fname. Once written into the in-memory ZipFile a few lines prior, the path separator may end up being different in the in-memory ZipFile buffer than in fname variable itself, leading to the KeyError exception 'there is no item named some\path\or\other.ext in the archive'.
I managed to hack my way past it by replacing the _generate_examples method with one I generated on-the-fly that replaced the line above with...
new_fobj = zipfile.ZipFile(buffer).open(fname.replace('', '/'))
...but the fix that needs to be pulled into the repository would need to be a bit hardier than that.
please explain how you were able to fix this I will really appreciate. thank you https://github.com/tensorflow/datasets/issues/3918#issuecomment-1890147832
Here's the code I used to get past the issue. The line I changed is prefaced by a comment that says HACKY FIX.
import tensorflow as tf import tensorflow_datasets as tfds import io import zipfile import logging
def __generate_examples(self, archive): num_skipped = 0 for fname, fobj in archive: res = tfds.image_classification.cats_vs_dogs._NAME_RE.match(fname) if not res: # README file, ... continue label = res.group(1).lower() if tf.compat.as_bytes("JFIF") not in fobj.peek(10): num_skipped += 1 continue
img_data = fobj.read()
img_tensor = tf.image.decode_image(img_data)
img_recoded = tf.io.encode_jpeg(img_tensor)
# Converting the recoded image back into a zip file container.
buffer = io.BytesIO()
with zipfile.ZipFile(buffer, "w") as new_zip:
new_zip.writestr(fname, img_recoded.numpy())
buffer.seek(0)
# HACKY FIX
new_fobj = zipfile.ZipFile(buffer).open(fname.replace('\\', '/'))
record = {
"image": new_fobj,
"image/filename": fname,
"label": label,
}
yield fname, record
if num_skipped != tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES:
raise ValueError(
"Expected %d corrupt images, but found %d"
% (tfds.image_classification.cats_vs_dogs._NUM_CORRUPT_IMAGES, num_skipped)
)
logging.warning("%d images were corrupted and were skipped", num_skipped)
tfds.image_classification.cats_vs_dogs.CatsVsDogs._generate_examples = __generate_examples data, metadata = tfds.load('cats_vs_dogs', as_supervised=True, with_info=True)
did you also add this "buffer.seek(0)" before the hacky fix line as well because from the source code here it is not there. Thank You. https://github.com/tensorflow/datasets/issues/3918#issuecomment-1892835410
I did not add buffer.seek(0).
Thank you. after using that it loaded in the data but it shows me a TypeError : 'NoneType' object is not subcriptable what could be the cause. https://github.com/tensorflow/datasets/issues/3918#issuecomment-1893946982 thank you so much for your help. I don't think it generates the split `import tensorflow_datasets as tfds from utils import _generate_examples tfds.image_classification.cats_vs_dogs.CatsVsDogs._generate_examples = _generate_examples
train_data = tfds.load('cats_vs_dogs', split='train[:80%]', data_dir='Datasets/training_dir', as_supervised=True) test_data = tfds.load('cats_vs_dogs', split='train[80%:90%]', data_dir='Datasets/test_dir', as_supervised=True) validation_data= tfds.load('cats_vs_dogs', split='train[-10%:]', data_dir='Datasets/validation_dir', as_supervised=True)`
here is my code
I didn't attempt this with the split parameter, so I can't comment on that.
People are still facing this issue in the tensorflow/tensorflow repo issues/84104. @ccl-core Can you please reopen the issue?
I believe this has to do with the difference in file paths in Windows vs Unix/Linux. I'll send out a CL to fix this.
A fix for this issue has been submitted through: https://github.com/tensorflow/datasets/commit/9969ce542f4b0e1cbf0a085e8e0df11bccea5c17.
In general, I solved this problem in the following way: Windows 10 Pro/ 20H2. Python 3.12.6 tensorflow_datasets 4.9.7
Create Jupyter Notebook file - file_name.ipynb
import tensorflow_datasets as tfds datos, metadatos = tfds.load('cats_vs_dogs', as_supervised=True, with_info=True) tfds.as_dataframe(datos['train'].take(5), metadatos)
I assembled the project on the site - https://colab.research.google.com I checked what is created after unpacking the archive in a folder on - https://colab.research.google.com I downloaded these files to the folder - C:\Users\user_!!!\tensorflow_datasets\cats_vs_dogs\4.0.1 user_!!! - Your Windows username It worked for me.
Archive with files - https://drive.google.com/file/d/1HhDgpoBy5tJ_UGOt4hCSrNEr_6Zv-lMm/view?usp=sharing