vegans icon indicating copy to clipboard operation
vegans copied to clipboard

How to load my own Custom data ?

Open nikkisingh111333 opened this issue 4 years ago • 10 comments

Hello ,i m new to gans architecture ...i cant find any DataLoader for custom datasets...all i can see is MNIST ,FaSHIONmnisT...but no implementation of Custom Dataset..what to do please do provide some help...

nikkisingh111333 avatar Nov 01 '21 18:11 nikkisingh111333

Hello @nikkisingh111333, Thanks for the question, always happy to help :) The internal DatasetLoader class is specifically designed to load test data from our public servers (we currently host CIFAR10, CIFAR100, MNIST; FashionMNIST and CelebA publicly). Most of these simply return numpy arrays in the shape of either: image shape: (numberOfExamples, Channels, numberOfxPixels, numberOfyPixels) labels: (numberOfExamples, numberOfLabelFeatures)

Only the CelebA dataset returns a torch.utils.data.DataLoader class because it is loaded into memory in chunks during training due to it's size.

So you can simply pass numpy arrays as input. Internally those are converted to a vegans DataSet class (as an intermediate step) which are immediately transformed to a torch.utils.data.DataLoader class. So if you are familiar with pytorch DataLoaders you can also pass them as an argument to the X_train argument of the .fit(...) call.

If you need more help setting the data up let us know. There are a few notebooks and code snippets dealing with some test data. Those might also help. Maybe it would also help to tell us what you are trying to do and what the data looks like so we can see what a solution for you can be.

Doc for torch DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

Let us know if this closes the issue for you :)

tneuer avatar Nov 05 '21 11:11 tneuer

Hello can i use torchvision to load my dataset and then use it in training Vegans model....if its how can i do that?

nikkisingh111333 avatar Nov 08 '21 12:11 nikkisingh111333

Which dataset are we talking about exactly? I tried it quickly for CIFAR10 and something like this could work:

import torchvision
import numpy as np

data = torchvision.datasets.CIFAR10("./CIFAR10", download=True)

images = np.array([np.array(image) for image, _ in data])
labels = np.array([label for _, label in data])

print(images.shape)
print(labels.shape)

Then you need to one hot encode the label vector (if you even want to use it, are you using a conditional model?). Additonally you need to bring the data into the correct shape from (50000, 32, 32, 3) to (50000, 3, 32, 32).

You could use images = vegans.utils.invert_channel_order(images) for that. These numpy arrays can then be passed to any vegans model.

But then again if you want CIFAR10 you can just take vegans DatasetLoader which takes these steps for you :)

If it's about your own data I would mainly recommend using numpy arrays. If the data is too large you might need torch.DataLoaders. You could copy the code from the CelebA DataLoader for that if it helps you :)

tneuer avatar Nov 08 '21 13:11 tneuer

hello i have a dataset of various handSigns ...i want to train generator on that dataset...how can i load that dataset on vegan.. ihave already tried cifar10 works good but i want to train on my dataset...please guide me regarding this.

nikkisingh111333 avatar Nov 08 '21 18:11 nikkisingh111333

Could you describe the issue in more detail?

  • How are your handsigns stored? As .png or similar in a separate folder? Or something else?
  • Do you have any labels you'd need to source as well?
  • Did you manage to load them into Python already? If so, how did you load them and what kind of object are they currently?
  • Is any preprocessing needed?
  • Is currently a specific vegans model complaining because you have the wrong data type or shape or is vegans currently not yet used?

tneuer avatar Nov 09 '21 07:11 tneuer

i didnot tried vegans on handsign although i dont know how to import dataset..images are jpg type ..folder names are labels ..i just have images and labels (folder name)...i want to import dataset on vegan ?

nikkisingh111333 avatar Nov 09 '21 09:11 nikkisingh111333

So there does NOT exist a native VeGANs datatype. All VeGANs models accept either torch Dataloaders or numpy arrays as inputs which are very common datatypes from different modules. You get both of these libraries as dependencies when installing vegans (via pip install vegans). You need to load your folder with the jpgs maybe using PILLOW (e.g. link here) or any other way. It is very hard for me to help to load general data, because I don't know how exactly you store your data physically...

tneuer avatar Nov 09 '21 14:11 tneuer

can you please share a piece of code for loading image dataset using pytorch dataloader and how to use in vegans please ?

nikkisingh111333 avatar Nov 09 '21 17:11 nikkisingh111333

Can you provide us with the dataset you try to load (the whole folder maybe)? Maybe upload it as zip to your own Github or get it to me in any way?

tneuer avatar Nov 09 '21 20:11 tneuer

Heres the dataaset i want to load google drive link: Gestures

nikkisingh111333 avatar Nov 10 '21 10:11 nikkisingh111333