KernelGAN Equivalence between generator and extracted kernel

Hi,

Thanks a lot for this great work! I have a quick question regarding the paper.

If I'm understanding it correctly, the idea is that the generator of KernelGAN can be always equated to a single kernel, which can be obtained via, e.g., KernelGAN.calc_curr_k. But do you mean that this equivalence is exact? In other words, the output of the generator is always exactly equal to convolving with this single kernel?

I tried to test this but from what I saw they do not seem to be the same. Can you please enlighten me on this? Many thanks in advance.

Jun 30 '20 18:06 michaelshiyu

Yes, G is always exactly equivalent to a convolution with a single kernel. Since there are no non-linear activations, the sequence of convolutions (which are LINEAR!!!) can be replaced with a single kernel - similar to x * 2 * 3 * 6 = x * 36 To verify, simply initialize G with random weight, pass an image through the network and compare it to OUR resize function (or Matlab's) with the computed kernel. You should get the exact same result

Jul 01 '20 07:07 sefibk

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

Jul 01 '20 17:07 michaelshiyu

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

post_process_k function

Jul 03 '20 01:07 jnoylinc

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Jul 03 '20 23:07 michaelshiyu

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

Aug 06 '20 07:08 1214635079

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

No，it is not important for me

Aug 06 '20 09:08 jnoylinc

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

Aug 06 '20 13:08 michaelshiyu

I am sorry but I don't have the time to thoroughly check it. Your script seems right. I verified correctness when the model was developed, but since it was a while ago, I don't recall if there was something special about it.

Aug 07 '20 07:08 sefibk

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

Aug 10 '20 06:08 jnoylinc

Great news @jnoylinc. thanks!

Aug 10 '20 07:08 sefibk

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator. Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

Nice! I believe G is always equivalent to a convolution with a single kernel since there are all linear layers. But can you tell me what the problem is in @michaelshiyu 's codes? Or can you attach your code here? Thanks!

Aug 11 '20 09:08 1214635079

@1214635079 I agree that a sequence of conv layers can be collapsed into a single conv layer if there are no nonlinearities involved. My concern is that this single conv kernel is not the same as the one computed using the given code, as demonstrated in my script above.

@jnoylinc Please post a minimal executable script so that we can reproduce your findings. Thanks!

Aug 12 '20 07:08 michaelshiyu

I also can't understand this function calc_curr_k , have you solved it?

Sep 04 '20 04:09 liuweiyy

@liuweiyy nope. Still waiting for the author to address this issue.

Sep 04 '20 04:09 michaelshiyu

Same quesition, looking for a mathematical proof or some papers about this.

Sep 04 '20 10:09 ZhuoranLyu

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

Sep 04 '20 12:09 sefibk

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

thank you,

Sep 04 '20 12:09 liuweiyy

@sefibk Thanks for the reply.

It'd be great if you could elaborate on the motivation part. Pointing us to a full proof somewhere on this would also be much appreciated. Thanks in advance.

Regarding the implementation, I demonstrated with the script I posted above that the kernel computed with calc_curr_k does not agree with the actual kernel. And I think that serves as a disproof on the correctness of the implementation unless someone points out where I was wrong in that script.

@jnoylinc said he/she verified, and I asked above for a reproducible implementation, but he/she did not respond.

Sep 04 '20 13:09 michaelshiyu

there is some bugs in your code,be careful, my code is deleted,I am soory

Sep 05 '20 12:09 jnoylinc

Regarding "theory" - it is nothing complicated. We are suppressing linear operations - we can definitely represent a sequence of convolutions as one. I don't have a proof for that obviously since it is very trivial. The use of a 'delta' was done just for simple implementation of the idea and having the kernel in our hands.

If I am not mistaken your code tests correctness on images organized as channels last. Try testing it with channels first (e.g. shape of (1, 3, 224, 224))

Sep 06 '20 07:09 sefibk

We can view conv op as matrix multiplication, then everything will be easy to understand.

Sep 07 '20 02:09 ZhuoranLyu

@ZhuoranLyu - Thank you for the assistance - that is definitely true

Sep 07 '20 06:09 sefibk