GDL_code icon indicating copy to clipboard operation
GDL_code copied to clipboard

the dataset are missing

Open marcograss opened this issue 6 years ago • 11 comments

several data set are missing such as ./data/celeb/ or the GAN one, and seems I cannot find instructions how to download them correctly in the book or readme

any hints?

marcograss avatar Aug 10 '19 11:08 marcograss

CelebA: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html You can get the files from the link to Google Drive or Baidu Drive. They are public datasets so no cost to download but you will need a Google account to access Google Drive.

You will also need to download the feature list csv file, list_attr_celeba.csv. I found it here: https://github.com/togheppi/cDCGAN/blob/master/list_attr_celeba.csv. You have to change the first header field from "filename" to "image_id". You can get it

The camel example is from the quickdraw_dataset https://github.com/googlecreativelab/quickdraw-dataset. The code uses .npy numpy files https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap. This also requires a Google account to access the Google Cloud Platform.

sfleisch avatar Sep 09 '19 19:09 sfleisch

Preparing these datasets for the notebooks is non-trivial. There should have been some instruction in the book. Can we at least get an errata online? Or an update to the notebooks on how to set them up?

RobAltena avatar Oct 01 '19 04:10 RobAltena

IANTA (I Am Not the Author) but: CelabA

  1. Get a google account if you don't have one.
  2. Go to the celebA website
    1. Select the Aligned&Cropped Images. This will take you to the Google Cloud Drive site.
  3. Sign in with your Google account
  4. Download
    1. Colab Notebooks > GAN > CelebA > Anno > list_attr_celeba.txt
    2. Colab Notebooks > GAN > CelebA > Img > img_align_celeba.zip
  5. Unzip img_align_celeba.zip to path/to/GDL_Code/data/celeb this will create img_align_celeba/*jpg
  6. Move the list_attr_celeba.txt to path/to/GDL_Code/data/celeb/list_attr_celeba.csv.
    1. Delete the first line with count of the number of lines in the file (202599).
    2. Prepend the header line that start with, "5_o_Clock_Shadow" with "image_id,...". That's it for celebA.

The QuickDraw dataset:

  1. With gsutil cp:

    1. Install gsutil into your python package. On Windows I'm using Anaconda 3 Python but still used pip: pip install gsutil Be careful if you are on Linux and install from a deb or rpm. There is another utility with the same name. Also, if you installed Anaconda on Windows for all users, make sure you run pip in a command windows as Administrator, otherwise the installation will fail.
  2. The gsutil command is gsutil cp -r gsutil cp -r gs://quickdraw_dataset/full/numpy_bitmap/camel.npy . local_path/to/GDL_Code/data/camel/```

  3. From the browser:

  4. You will need a Google account.

  5. Go to the quickdraw numpy dataset bucket in the GCP browser.

  6. Select camel.np and click on the ... and download the file to your data/camel directory.

sfleisch avatar Oct 06 '19 19:10 sfleisch

As @RobAltena already said, preparing those data sets aren't trivial, but at least based on what @sfleisch wrote, I think adding these instructions to README, and maybe also as some comments into to the notebooks themselves would be very beneficial (also to the second printing of the book maybe?).

Of course, it can be semi-automated, such as "Step 1, download it from that Google Drive location", "Step 2, shell/Python code that corresponds to the remaining instructions".

emres avatar Nov 25 '19 13:11 emres

The dataset can be easily downloaded from Kaggle and extracted in the path/to/GDL_Code/data/celeb directory.

deepankverma avatar Feb 05 '20 18:02 deepankverma

While loading the camel.npy data, I get the following error in method load_safari():

ValueError: Cannot load file containing pickled data when allow_pickle=False

Tried setting the value for allow_pickle=True, and then got this other error:

OSError: Failed to interpret file 'camel.npy' as a pickle

Any ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.

Thanks,

Manuel

manuelr417 avatar Apr 04 '20 15:04 manuelr417

While loading the camel.npy data, I get the following error in method load_safari():

ValueError: Cannot load file containing pickled data when allow_pickle=False

Tried setting the value for allow_pickle=True, and then got this other error:

OSError: Failed to interpret file 'camel.npy' as a pickle Any ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.

Thanks,

Manuel

IANTA (I Am Not the Author) but: CelabA

  1. Get a google account if you don't have one.

  2. Go to the celebA website

    1. Select the Aligned&Cropped Images. This will take you to the Google Cloud Drive site.
  3. Sign in with your Google account

  4. Download

    1. Colab Notebooks > GAN > CelebA > Anno > list_attr_celeba.txt
    2. Colab Notebooks > GAN > CelebA > Img > img_align_celeba.zip
  5. Unzip img_align_celeba.zip to path/to/GDL_Code/data/celeb this will create img_align_celeba/*jpg

  6. Move the list_attr_celeba.txt to path/to/GDL_Code/data/celeb/list_attr_celeba.csv.

    1. Delete the first line with count of the number of lines in the file (202599).
    2. Prepend the header line that start with, "5_o_Clock_Shadow" with "image_id,...". That's it for celebA.

The QuickDraw dataset:

  1. With gsutil cp:

    1. Install gsutil into your python package. On Windows I'm using Anaconda 3 Python but still used pip: pip install gsutil Be careful if you are on Linux and install from a deb or rpm. There is another utility with the same name. Also, if you installed Anaconda on Windows for all users, make sure you run pip in a command windows as Administrator, otherwise the installation will fail.
  2. The gsutil command is gsutil cp -r gsutil cp -r gs://quickdraw_dataset/full/numpy_bitmap/camel.npy . local_path/to/GDL_Code/data/camel/```

  3. From the browser:

  4. You will need a Google account.

  5. Go to the quickdraw numpy dataset bucket in the GCP browser.

  6. Select camel.np and click on the ... and download the file to your data/camel directory.

manuelr417 avatar Apr 04 '20 15:04 manuelr417

While loading the camel.npy data, I get the following error in method load_safari():

ValueError: Cannot load file containing pickled data when allow_pickle=False

Tried setting the value for allow_pickle=True, and then got this other error:

OSError: Failed to interpret file 'camel.npy' as a pickle

Any ideas what is going on? I'm stuck in Chapter 4 as I cannot load the data for training.

Thanks,

Manuel

It appears my file was corrupted during transfer. I fixed the problem by downloading the file by hand from the browser.

Manuel

manuelr417 avatar Apr 04 '20 16:04 manuelr417

I'm having similar problems related to this discrepancy between the book and the code, and needing to download the camel dataset directly. I can't replicate the safari_loader() because something strange is happening at the slice_train step. It's just returning empty sets for the xtotal, ytotal.

What is the slice_train step doing in the loader? Maybe I will be able to work around it to get the code from the book working? What values should I be expecting there?

slice_train = int(80000/len(txt_name_list))  ###Setting value to be 80000 for the final dataset
...
        x = x[:slice_train]
        y = y[:slice_train]

BBirdselllab avatar Jun 20 '20 16:06 BBirdselllab

Hey this is still not working. Here is an idea: before writing a book and having people buy it, try making sure all parts of your code and book make sense. Just an idea. This is horrible

josephlandau avatar Sep 18 '20 23:09 josephlandau

I have downloaded the data from Kaggle. When you try to upload it, it takes like forever since it's A LOT of files. I ran into an issue that in jupyter notebook, I had to manually click "upload" button for each of the zillion files, to confirm the upload. It would take forever, so I made this simple script you can run after uploading the images:

const buttons = document.getElementsByClassName('upload_button'); for (let button of buttons) { button.click(); }

Remember that the "celeb" folder has to contain atleast one more subfolder into which you place the dataset, since that's how the image loader works.

Afterwards the training runs just fine.

DrewJay avatar Oct 11 '20 22:10 DrewJay