Question regarding the comparison with the DCGAN

Open tonyyunyang opened this issue 1 year ago • 1 comments

Hi, I love your videos, I have literally watched every single second of them. I would like to ask that here you said in the video that, it now can take care of input of all sizes. What does that mean, I am hoping if you could exaggerate on that a bit.

https://github.com/explainingai-code/StableDiffusion-PyTorch/blob/ac8fb10825aae4416e099900cb4d2919732222ae/models/discriminator.py#L5

Oct 27 '24 16:10 tonyyunyang

Thank you so much for your support :) A regular DCGAN discriminator maps inputs of say shape 256x256 to single scalar output, so in scenarios where you need to feed images of different size(512 x512) you would require to change the architecture, and add in more layers to achieve the same single scalar output on this larger input. But PatchGan maps input to an array of patches with discriminator generating predictions for patches of size NxN. This means that even when you change discriminator inputs from 256x256 to 512x512, you dont need to change anything in the architecture, the only difference would be that now the discriminator generates predictions for four times more patches than before. That is what I was referring to in the video. Hope its clearer now.

Oct 28 '24 04:10 explainingai-code