Multiple images
Hi! Thank you so much for providing the code. The video course is amazing, really helpful.
I have a question: is it possible to modify the code so that we can use more than one image as input and condition (target) data? In other words, can we do, e.g., 2images-to-3images, taking 2 images to predict 3 images.
Thanks again!
Hi, @cpnovaes, thank you!
Indeed, that would require some changes to the code examples, but the current conditional model allows to specify the condition_channels at initialisation (see https://github.com/mikonvergence/DiffusionFastForward/blob/master/src/PixelDiffusion.py):
class PixelDiffusionConditional(PixelDiffusion):
def __init__(self,
train_dataset,
valid_dataset=None,
condition_channels=3, # <- here
batch_size=1,
lr=1e-3):
This means that you can potentially set the conditional_channels parameter to 6 for 2 RGB images and reuse the same framework (when you pass through the network you need to concatenate along the channel dimension torch.cat([condition_1, condition_2],1)).
I am not entirely sure if this is what you're looking for, so let me know (and ideally provide some data examples) if this needs further discussion! Thanks again
Hi @mikonvergence, thanks a lot for answering my question!
I have been trying to modify the code following your suggestion, which is what I was looking for. My case is the following: I give 2 images as input and 1 image as output (what I want to predict at the end; this one is somehow related to the 2 input images). These images are not RGB, but simple (128,128) matrices (.npy file). This is an example:
I figured that, in the case of input and output having a different number of channels, I need to do the following:
class PixelDiffusionConditional(PixelDiffusion):
def __init__(self,
train_dataset,
valid_dataset=None,
condition_channels=3, # <- here
generated_channels=3, # <- also here!
batch_size=1,
lr=1e-3):
and use: condition_channels=2 and generated_channels=1. Modifying Class SimpleImageDataset(Dataset) accordingly, my data will have the shape: train_ds[0][0].shape = torch.Size([2, 64, 64] and train_ds[0][1].shape) = torch.Size([1, 64, 64]).
Please, let me know if that make sense or if I may be missing something.
In the case of Conditional Latent Diffusion, I am still trying to implement a similar idea, but I am having problems making the autoencoder accept a different number of channels. In principle, I could follow the same idea, right?
Thanks!
Hi @cpnovaes! That's exactly the right approach with the PixelDiffusion type.
For the latent diffusion, that will be tricky, because:
- The Autoencoder has been trained on natural images with losses that promote aesthetic quality of images, so might not be ideal for compressing other types of signals
- As you said, it is designed to work with 3 channels (RGB) - you could potentially encode each signal (2 conditions and 1 generated, all with single channel as I understand) by feeding each as a 'greyscale' image to the encoder (assuming that your values are bounded and can be mapped to [-1,+1] range)
However, if your signals are only 64 by 64, there could be less need for a latent diffusion approach. If you later plan to work with larger matrices, then I would suggest to finetune your own autoencoder, but that's outside of the scope of this course. I am always happy to provide hints here though, so feel free to continue this thread.
Hi @mikonvergence !
Thank you so much, I have learned a lot from all your comments!
My signals are 128x128, but I am testing the PixelDiffusion on them. In the meantime, I am also studying a possible implementation of the autoenconder.
Thanks again!