Custom dataset

Open ethancohen123 opened this issue 3 years ago • 1 comments

Hi, Is there a way to use this model for training conditional image generation (from text or other) with a custom dataset ? Thanks

Jul 19 '22 16:07 ethancohen123

Bumping this thread. I replicated the important bits from the conditional imagenet config and found one issue I couldn't get past.

First you need to extend the custom data loader to provide the conditional information you want. For me it was class labels, which was pretty easy do add following the imagenet data loader. For text, it's probably useful to look at the config files in models/ to see that some use a BERTEmbedder.

I found that during training, attempting to log images through the image_logger callback resulted in the following CUDNN error, which is just a definition of some operations that will(?) trigger an error. Pretty sure the contents of the operations following the script don't matter.

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

From the trackback, the error starts at log_images

File "/home/latent-diffusion/main.py", line 386, in on_train_batch_end
     self.log_img(pl_module, batch, batch_idx, split="train")
File "/home/latent-diffusion/main.py", line 353, in log_img
     images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
... [ down to running the diffusion model (first stage decoder) ]
    return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)
File "/home/latent-diffusion/ldm/models/autoencoder.py", line 281, in decode
     dec = self.decoder(quant)

The actual error stems from a conv2d inside a self.up call within the decoder. I can post the full trace in a gist if that's helpful.

My guess is that its not currently feasible to run image logging with a conditional LDM, given that there's a whole notebook dedicated to inference with a conditional Imagenet LDM. But if that's not necessarily the case, or if its an easy modification. I'd like to know.

Thanks!

Jul 25 '22 19:07 neale