KernelGAN icon indicating copy to clipboard operation
KernelGAN copied to clipboard

Equivalence between generator and extracted kernel

Open michaelshiyu opened this issue 5 years ago • 22 comments

Hi,

Thanks a lot for this great work! I have a quick question regarding the paper.

If I'm understanding it correctly, the idea is that the generator of KernelGAN can be always equated to a single kernel, which can be obtained via, e.g., KernelGAN.calc_curr_k. But do you mean that this equivalence is exact? In other words, the output of the generator is always exactly equal to convolving with this single kernel?

I tried to test this but from what I saw they do not seem to be the same. Can you please enlighten me on this? Many thanks in advance.

michaelshiyu avatar Jun 30 '20 18:06 michaelshiyu

Yes, G is always exactly equivalent to a convolution with a single kernel. Since there are no non-linear activations, the sequence of convolutions (which are LINEAR!!!) can be replaced with a single kernel - similar to x * 2 * 3 * 6 = x * 36 To verify, simply initialize G with random weight, pass an image through the network and compare it to OUR resize function (or Matlab's) with the computed kernel. You should get the exact same result

sefibk avatar Jul 01 '20 07:07 sefibk

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

michaelshiyu avatar Jul 01 '20 17:07 michaelshiyu

Thanks a lot for answering!

I agree that a sequence of linear convolution layers can be collapsed into a single one, but is there a simple way to see that this filter is exactly the one we obtain by running calc_curr_k?

Also, I tried testing this numerically. Here's a minimal script replicating what I did:

import os
import numpy as np
from easydict import EasyDict as edict
import torch

os.chdir('/path/to/KernelGAN/')
import loss
import networks
import torch.nn.functional as F
from util import save_final_kernel, run_zssr, post_process_k, kernel_shift
from imresize import imresize

# set config
d = edict()
d.input_crop_size = 64
d.scale_factor = .5
d.G_chan = 64
d.G_kernel_size = 13
d.D_chan = 64
d.D_n_layers = 7
d.D_kernel_size = 7
d.G_structure = [7, 5, 3, 1, 1, 1]

# re-define a simpler version of KernelGAN class
class KernelGAN:
    def __init__(self, conf):
        # Acquire configuration
        self.conf = conf

        # Define the GAN
        self.G = networks.Generator(conf)
        # self.D = networks.Discriminator(conf)

        # The kernel G is imitating
        self.curr_k = torch.FloatTensor(conf.G_kernel_size, conf.G_kernel_size)

    def calc_curr_k(self):
        """given a generator network, the function calculates the kernel it is imitating"""
        delta = torch.Tensor([1.]).unsqueeze(0).unsqueeze(-1).unsqueeze(-1)
        for ind, w in enumerate(self.G.parameters()):
            curr_k = F.conv2d(delta, w, padding=self.conf.G_kernel_size - 1) if ind == 0 else F.conv2d(curr_k, w)
        self.curr_k = curr_k.squeeze().flip([0, 1])

# init network, get kernel
net = KernelGAN(d)
net.calc_curr_k()

# get a random image patch
img = np.random.rand(64, 64, 3).astype(np.float32)

# get rescaled images w/ downscaling factor 2
# k = kernel_shift(net.curr_k.detach().float().numpy(), 2)
k = net.curr_k.detach().float().numpy()
net_out = net.G(torch.from_numpy(img).view(1, 3, 64, 64)).detach()
k_out = imresize(img, .5, kernel=k).transpose((2, 0, 1))

# compare shapes
print(net_out.size())  # torch.Size([1, 3, 26, 26])
print(k_out.shape)  # (3, 32, 32)

# compare the middle two rows of the output images
print(net_out[0][0][12:14])
print(k_out[0][15:17])

And the outputs are

tensor([[-0.0034, -0.0152, -0.0061, -0.0219, -0.0208, -0.0114, -0.0156, -0.0264,
         -0.0260, -0.0195, -0.0092, -0.0163, -0.0102, -0.0141, -0.0291, -0.0120,
         -0.0058, -0.0234, -0.0333, -0.0195, -0.0151, -0.0302, -0.0316, -0.0204,
         -0.0284,  0.0142],
        [-0.0317, -0.0024, -0.0353, -0.0156, -0.0192, -0.0156, -0.0207, -0.0327,
         -0.0191, -0.0184, -0.0149, -0.0182, -0.0242, -0.0179, -0.0219, -0.0193,
         -0.0055, -0.0402, -0.0166, -0.0137, -0.0180, -0.0203,  0.0003,  0.0034,
         -0.0219, -0.0291]])

[[-0.01040298 -0.02667889 -0.01370732 -0.0288636  -0.02506967 -0.01364688
  -0.00924972 -0.00425742 -0.01827828 -0.00914193 -0.02248633 -0.00315671
  -0.00693536 -0.01682552 -0.02707781 -0.02352768 -0.0201562  -0.01160016
  -0.01983571 -0.01985664 -0.022903   -0.03139445 -0.0211829  -0.00792672
  -0.00433569 -0.00781635 -0.01967012 -0.01601905 -0.01832438 -0.02570574
  -0.02028925 -0.0125796 ]
 [-0.01464794 -0.01586494 -0.02722405 -0.01068424 -0.02654948 -0.01590517
  -0.04187544  0.00326075 -0.00151808 -0.02657411 -0.02184494 -0.01424225
  -0.00618783 -0.02599089 -0.02297904 -0.01945422 -0.03042439 -0.01876058
  -0.02300953 -0.01963425 -0.00840907 -0.00931074 -0.01729889 -0.00336883
  -0.01440977 -0.02402707 -0.00701743  0.00232169 -0.01451722 -0.02746969
  -0.01549856 -0.02331167]]

As you can see, they don't seem to agree. And they do not agree with each other if I used the shifted kernel obtained via kernel_shift either. Surely, I've made mistake(s) somewhere in this script? Could you kindly point that out please?

Thanks again and I appreciate your help!

post_process_k function

jnoylinc avatar Jul 03 '20 01:07 jnoylinc

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

michaelshiyu avatar Jul 03 '20 23:07 michaelshiyu

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

1214635079 avatar Aug 06 '20 07:08 1214635079

@jnoylinc post_process_k does two things. It first zeros out negligible values in the extracted kernel, which, if anything, only results in the kernel further deviating from the generator. Second, it calls the kernel_shift function on the extracted kernel, which I tried (commented out in the above snippet) but still did not make the kernel produce identical output as the network.

Yes, I also met this question, have you solved it?

No,it is not important for me

jnoylinc avatar Aug 06 '20 09:08 jnoylinc

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

michaelshiyu avatar Aug 06 '20 13:08 michaelshiyu

I am sorry but I don't have the time to thoroughly check it. Your script seems right. I verified correctness when the model was developed, but since it was a while ago, I don't recall if there was something special about it.

sefibk avatar Aug 07 '20 07:08 sefibk

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator.

Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

jnoylinc avatar Aug 10 '20 06:08 jnoylinc

Great news @jnoylinc. thanks!

sefibk avatar Aug 10 '20 07:08 sefibk

Hi @1214635079, I haven't. I might be wrong here but I do not think the extracted kernel is the same as the learned generator. Hi @sefibk, have you got a chance to look into this? Thanks!

I have check the code, the auther is right

Nice! I believe G is always equivalent to a convolution with a single kernel since there are all linear layers. But can you tell me what the problem is in @michaelshiyu 's codes? Or can you attach your code here? Thanks!

1214635079 avatar Aug 11 '20 09:08 1214635079

@1214635079 I agree that a sequence of conv layers can be collapsed into a single conv layer if there are no nonlinearities involved. My concern is that this single conv kernel is not the same as the one computed using the given code, as demonstrated in my script above.

@jnoylinc Please post a minimal executable script so that we can reproduce your findings. Thanks!

michaelshiyu avatar Aug 12 '20 07:08 michaelshiyu

I also can't understand this function calc_curr_k , have you solved it?

liuweiyy avatar Sep 04 '20 04:09 liuweiyy

@liuweiyy nope. Still waiting for the author to address this issue.

michaelshiyu avatar Sep 04 '20 04:09 michaelshiyu

Same quesition, looking for a mathematical proof or some papers about this.

ZhuoranLyu avatar Sep 04 '20 10:09 ZhuoranLyu

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

sefibk avatar Sep 04 '20 12:09 sefibk

I didn't understand you are waiting on me... Regarding the idea: mathematically, if you pass a delta kernel (1 in the center and zero otherwise) through a sequence of kernels (and flip the result) you get a single kernel equivalent to the sequence. That is the motivation for the function. If that is what you don't understand - point it out and I can elaborate. If you think I have a problem in the implementation - point it out and I can try to re-check. However it is very straight forward and I checked it many times before publication. In addition @jnoylinc points out he verified it is correct.

thank you,

liuweiyy avatar Sep 04 '20 12:09 liuweiyy

@sefibk Thanks for the reply.

It'd be great if you could elaborate on the motivation part. Pointing us to a full proof somewhere on this would also be much appreciated. Thanks in advance.

Regarding the implementation, I demonstrated with the script I posted above that the kernel computed with calc_curr_k does not agree with the actual kernel. And I think that serves as a disproof on the correctness of the implementation unless someone points out where I was wrong in that script.

@jnoylinc said he/she verified, and I asked above for a reproducible implementation, but he/she did not respond.

michaelshiyu avatar Sep 04 '20 13:09 michaelshiyu

there is some bugs in your code,be careful, my code is deleted,I am soory

jnoylinc avatar Sep 05 '20 12:09 jnoylinc

Regarding "theory" - it is nothing complicated. We are suppressing linear operations - we can definitely represent a sequence of convolutions as one. I don't have a proof for that obviously since it is very trivial. The use of a 'delta' was done just for simple implementation of the idea and having the kernel in our hands.

If I am not mistaken your code tests correctness on images organized as channels last. Try testing it with channels first (e.g. shape of (1, 3, 224, 224))

sefibk avatar Sep 06 '20 07:09 sefibk

We can view conv op as matrix multiplication, then everything will be easy to understand.

ZhuoranLyu avatar Sep 07 '20 02:09 ZhuoranLyu

@ZhuoranLyu - Thank you for the assistance - that is definitely true

sefibk avatar Sep 07 '20 06:09 sefibk