the dataset are missing
several data set are missing such as ./data/celeb/ or the GAN one, and seems I cannot find instructions how to download them correctly in the book or readme
any hints?
CelebA: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html You can get the files from the link to Google Drive or Baidu Drive. They are public datasets so no cost to download but you will need a Google account to access Google Drive.
You will also need to download the feature list csv file, list_attr_celeba.csv. I found it here: https://github.com/togheppi/cDCGAN/blob/master/list_attr_celeba.csv. You have to change the first header field from "filename" to "image_id". You can get it
The camel example is from the quickdraw_dataset https://github.com/googlecreativelab/quickdraw-dataset. The code uses .npy numpy files https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap. This also requires a Google account to access the Google Cloud Platform.
Preparing these datasets for the notebooks is non-trivial. There should have been some instruction in the book. Can we at least get an errata online? Or an update to the notebooks on how to set them up?
IANTA (I Am Not the Author) but: CelabA
- Get a google account if you don't have one.
- Go to the celebA website
- Select the Aligned&Cropped Images. This will take you to the Google Cloud Drive site.
- Sign in with your Google account
- Download
- Colab Notebooks > GAN > CelebA > Anno > list_attr_celeba.txt
- Colab Notebooks > GAN > CelebA > Img > img_align_celeba.zip
- Unzip img_align_celeba.zip to path/to/GDL_Code/data/celeb this will create img_align_celeba/*jpg
- Move the list_attr_celeba.txt to path/to/GDL_Code/data/celeb/list_attr_celeba.csv.
- Delete the first line with count of the number of lines in the file (202599).
- Prepend the header line that start with, "5_o_Clock_Shadow" with "image_id,...". That's it for celebA.
The QuickDraw dataset:
-
With gsutil cp:
- Install gsutil into your python package. On Windows I'm using Anaconda 3 Python but still used pip:
pip install gsutilBe careful if you are on Linux and install from a deb or rpm. There is another utility with the same name. Also, if you installed Anaconda on Windows for all users, make sure you run pip in a command windows as Administrator, otherwise the installation will fail.
- Install gsutil into your python package. On Windows I'm using Anaconda 3 Python but still used pip:
-
The gsutil command is
gsutil cp -r gsutil cp -r gs://quickdraw_dataset/full/numpy_bitmap/camel.npy .local_path/to/GDL_Code/data/camel/``` -
From the browser:
-
You will need a Google account.
-
Go to the quickdraw numpy dataset bucket in the GCP browser.
-
Select camel.np and click on the
...and download the file to your data/camel directory.
As @RobAltena already said, preparing those data sets aren't trivial, but at least based on what @sfleisch wrote, I think adding these instructions to README, and maybe also as some comments into to the notebooks themselves would be very beneficial (also to the second printing of the book maybe?).
Of course, it can be semi-automated, such as "Step 1, download it from that Google Drive location", "Step 2, shell/Python code that corresponds to the remaining instructions".
The dataset can be easily downloaded from Kaggle and extracted in the path/to/GDL_Code/data/celeb directory.
While loading the camel.npy data, I get the following error in method load_safari():
ValueError: Cannot load file containing pickled data when allow_pickle=False
Tried setting the value for allow_pickle=True, and then got this other error:
OSError: Failed to interpret file 'camel.npy' as a pickle
Any ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.
Thanks,
Manuel
While loading the camel.npy data, I get the following error in method load_safari():
ValueError: Cannot load file containing pickled data when allow_pickle=False
Tried setting the value for allow_pickle=True, and then got this other error:
OSError: Failed to interpret file 'camel.npy' as a pickle
Any ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.
Thanks,
Manuel
IANTA (I Am Not the Author) but: CelabA
Get a google account if you don't have one.
Go to the celebA website
- Select the Aligned&Cropped Images. This will take you to the Google Cloud Drive site.
Sign in with your Google account
Download
- Colab Notebooks > GAN > CelebA > Anno > list_attr_celeba.txt
- Colab Notebooks > GAN > CelebA > Img > img_align_celeba.zip
Unzip img_align_celeba.zip to path/to/GDL_Code/data/celeb this will create img_align_celeba/*jpg
Move the list_attr_celeba.txt to path/to/GDL_Code/data/celeb/list_attr_celeba.csv.
- Delete the first line with count of the number of lines in the file (202599).
- Prepend the header line that start with, "5_o_Clock_Shadow" with "image_id,...". That's it for celebA.
The QuickDraw dataset:
With gsutil cp:
- Install gsutil into your python package. On Windows I'm using Anaconda 3 Python but still used pip:
pip install gsutilBe careful if you are on Linux and install from a deb or rpm. There is another utility with the same name. Also, if you installed Anaconda on Windows for all users, make sure you run pip in a command windows as Administrator, otherwise the installation will fail.The gsutil command is
gsutil cp -r gsutil cp -r gs://quickdraw_dataset/full/numpy_bitmap/camel.npy .local_path/to/GDL_Code/data/camel/```From the browser:
You will need a Google account.
Go to the quickdraw numpy dataset bucket in the GCP browser.
Select camel.np and click on the
...and download the file to your data/camel directory.
While loading the camel.npy data, I get the following error in method
load_safari():
ValueError: Cannot load file containing pickled data when allow_pickle=FalseTried setting the value for allow_pickle=True, and then got this other error:
OSError: Failed to interpret file 'camel.npy' as a pickleAny ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.
Thanks,
Manuel
It appears my file was corrupted during transfer. I fixed the problem by downloading the file by hand from the browser.
Manuel
I'm having similar problems related to this discrepancy between the book and the code, and needing to download the camel dataset directly. I can't replicate the safari_loader() because something strange is happening at the slice_train step. It's just returning empty sets for the xtotal, ytotal.
What is the slice_train step doing in the loader? Maybe I will be able to work around it to get the code from the book working? What values should I be expecting there?
slice_train = int(80000/len(txt_name_list)) ###Setting value to be 80000 for the final dataset
...
x = x[:slice_train]
y = y[:slice_train]
Hey this is still not working. Here is an idea: before writing a book and having people buy it, try making sure all parts of your code and book make sense. Just an idea. This is horrible
I have downloaded the data from Kaggle. When you try to upload it, it takes like forever since it's A LOT of files. I ran into an issue that in jupyter notebook, I had to manually click "upload" button for each of the zillion files, to confirm the upload. It would take forever, so I made this simple script you can run after uploading the images:
const buttons = document.getElementsByClassName('upload_button'); for (let button of buttons) { button.click(); }
Remember that the "celeb" folder has to contain atleast one more subfolder into which you place the dataset, since that's how the image loader works.
Afterwards the training runs just fine.