DiffusionFastForward Multiple images

Hi! Thank you so much for providing the code. The video course is amazing, really helpful.

I have a question: is it possible to modify the code so that we can use more than one image as input and condition (target) data? In other words, can we do, e.g., 2images-to-3images, taking 2 images to predict 3 images.

Thanks again!

Jan 01 '24 02:01 cpnovaes

Hi, @cpnovaes, thank you!

Indeed, that would require some changes to the code examples, but the current conditional model allows to specify the condition_channels at initialisation (see https://github.com/mikonvergence/DiffusionFastForward/blob/master/src/PixelDiffusion.py):

class PixelDiffusionConditional(PixelDiffusion):
    def __init__(self,
                 train_dataset,
                 valid_dataset=None,
                 condition_channels=3, # <- here
                 batch_size=1,
                 lr=1e-3):

This means that you can potentially set the conditional_channels parameter to 6 for 2 RGB images and reuse the same framework (when you pass through the network you need to concatenate along the channel dimension torch.cat([condition_1, condition_2],1)).

I am not entirely sure if this is what you're looking for, so let me know (and ideally provide some data examples) if this needs further discussion! Thanks again

Jan 01 '24 07:01 mikonvergence

Hi @mikonvergence, thanks a lot for answering my question!

I have been trying to modify the code following your suggestion, which is what I was looking for. My case is the following: I give 2 images as input and 1 image as output (what I want to predict at the end; this one is somehow related to the 2 input images). These images are not RGB, but simple (128,128) matrices (.npy file). This is an example:

Figure 2024-01-03 093713

I figured that, in the case of input and output having a different number of channels, I need to do the following:

class PixelDiffusionConditional(PixelDiffusion):
    def __init__(self,
                 train_dataset,
                 valid_dataset=None,
                 condition_channels=3, # <- here
                 generated_channels=3, # <- also here!
                 batch_size=1,
                 lr=1e-3):

and use: condition_channels=2 and generated_channels=1. Modifying Class SimpleImageDataset(Dataset) accordingly, my data will have the shape: train_ds[0][0].shape = torch.Size([2, 64, 64] and train_ds[0][1].shape) = torch.Size([1, 64, 64]).

Please, let me know if that make sense or if I may be missing something.

In the case of Conditional Latent Diffusion, I am still trying to implement a similar idea, but I am having problems making the autoencoder accept a different number of channels. In principle, I could follow the same idea, right?

Thanks!

Jan 03 '24 00:01 cpnovaes

Hi @cpnovaes! That's exactly the right approach with the PixelDiffusion type.

For the latent diffusion, that will be tricky, because:

The Autoencoder has been trained on natural images with losses that promote aesthetic quality of images, so might not be ideal for compressing other types of signals
As you said, it is designed to work with 3 channels (RGB) - you could potentially encode each signal (2 conditions and 1 generated, all with single channel as I understand) by feeding each as a 'greyscale' image to the encoder (assuming that your values are bounded and can be mapped to [-1,+1] range)

However, if your signals are only 64 by 64, there could be less need for a latent diffusion approach. If you later plan to work with larger matrices, then I would suggest to finetune your own autoencoder, but that's outside of the scope of this course. I am always happy to provide hints here though, so feel free to continue this thread.

Jan 05 '24 20:01 mikonvergence

Hi @mikonvergence !

Thank you so much, I have learned a lot from all your comments!

My signals are 128x128, but I am testing the PixelDiffusion on them. In the meantime, I am also studying a possible implementation of the autoenconder.

Thanks again!

Jan 13 '24 09:01 cpnovaes