generative-compression icon indicating copy to clipboard operation
generative-compression copied to clipboard

Relationship between channels C and bbp

Open chenxianghu opened this issue 7 years ago • 20 comments

you said C=8 channels - (compression to 0.072 bbp) in Results section I don't know the relationship between C and bbp, can you explain to me?

We compare this image compression effect with BPG png image -> decode -> our encode -> our quantize Result: quantized representation

the bbp comparation objects are quantized representation and encoded BPG, same bbp which decoded image quality is better or same decoded image quality which bbp is lower, right?

chenxianghu avatar May 26 '18 10:05 chenxianghu

If you read the original paper (https://arxiv.org/pdf/1804.02958.pdf), the upper bound on the bitrate is given by Eq. 5. Here dim (w_hat) is given by the number of channels C.

Justin-Tan avatar May 27 '18 11:05 Justin-Tan

I want to test the performance of this Model, so i modify single_plot function like this and then run your compress.py, there are two steps:

  1. original image -> quantized representation ----spend 623ms
  2. quantized representation -> reconstructed image --- spend 115 ms is my test method right? and what's the performance measured data at your size?

If i want to realize End-to-End image compression, i think i should save the quantized representation as file at sender side , and recover file to reconstructed image at receiver side, sender and receiver both should load the well-trained model, is my thinking right?

def single_plot(epoch, global_step, sess, model, handle, name, config, single_compress=False):

    real = model.example
    gen = model.reconstruction
    zz = model.z
    start = time.time()
    # Generate images from noise, using the generator network.
    #r, g = sess.run([real, gen], feed_dict={model.training_phase:True, model.handle: handle})
    r,**z** = sess.run([real,zz], feed_dict={model.training_phase: True, **model.handle: handle**})
    print("encoder + quantizer spend  Time: {:.3f} s".format(time.time() - start))
    print('z shape:', z.shape)
    #print('z result:',z)
    start = time.time()
    **g** = sess.run(gen, feed_dict={model.training_phase: True, **model.z: z**})
    print("generator spend  Time: {:.3f} s".format(time.time() - start))

chenxianghu avatar May 29 '18 01:05 chenxianghu

I test your pre-trained mode, the test data: 1.original image -> quantized representation ----about 1.5s 2.quantized representation -> reconstructed image --- about 1s

the test result is different from mine, because my input image size is 256x256

I also test your model effect using different images:

  1. image from leftImg8bit/train ----effect is good
  2. image from leftImg8bit/test ----effect is worse than image from train dir
  3. image from internet ---effect is terrible

can this model can be used for compressing arbitrary images?

chenxianghu avatar May 29 '18 09:05 chenxianghu

If you want to compress arbitrary images, train on a large dataset of natural images like ImageNet or the ADE20k dataset. The pretrained model was only trained on the Cityscapes dataset, which is a collection of street scenes from Germany and Switzerland.

The distribution of images in ImageNet/ADE20k will be more diverse and so the model will probably take longer to converge. To train on ADE20k download the dataset from the link in the readme and pass the --ds ADE20k flag:

python3 train.py -ds ADE20k <args>

To train on ImageNet you will have to write your own data loader. I think it will work with the default setup, but you will have to check this.

Justin-Tan avatar May 29 '18 10:05 Justin-Tan

First I train my model using cityscapes 60 epochs, and then continue to train this model using ADE20k 10 epochs, i find the compress effect become wrose.Maybe the model doesn't converge. I think it is hard to compress arbitrary image using one model.

chenxianghu avatar Jun 04 '18 03:06 chenxianghu

Don't train using Cityscapes initially, just train using ADE20k. Make sure you pull the latest version, I fixed a couple of errors in the code.

It should take a long time for the model to converge using ADE20k, the authors trained it for 50 epochs originally to get the results in the paper.

Justin-Tan avatar Jun 04 '18 09:06 Justin-Tan

OK, this morning i also read the paper, i find i should train ADE20k from ZERO, but one error occured it seems that the shape of self.w_hat and Gv didn't match, so i disable sampling noise by adding a condition like below, now it is working well and under training. Thank you!

        if config.sample_noise is True and dataset != 'ADE20k':
            print('Sampling noise...')
            # noise_prior = tf.contrib.distributions.Uniform(-1., 1.)
            # self.noise_sample = noise_prior.sample([tf.shape(self.example)[0], config.noise_dim])
            noise_prior = tf.contrib.distributions.MultivariateNormalDiag(loc=tf.zeros([config.noise_dim]), scale_diag=tf.ones([config.noise_dim]))
            v = noise_prior.sample(tf.shape(self.example)[0])
            Gv = Network.dcgan_generator(v, config, self.training_phase, C=config.channel_bottleneck, upsample_dim=config.upsample_dim)
            print('Gv:', Gv);
            self.z = tf.concat([self.w_hat, Gv], axis=-1)
        else:
            self.z = self.w_hat

chenxianghu avatar Jun 04 '18 09:06 chenxianghu

I modify the network as you do, but there still have problem with Incompatible shapes: [1,3,688,512] vs. [1,3,683,512] in Line 127 (model.py): distortion_penalty = config.lambda_X * tf.losses.mean_squared_error(self.example, self.reconstruction). Do you have any suggest?

wensihan avatar Jun 12 '18 03:06 wensihan

@chenxianghu

wensihan avatar Jun 12 '18 03:06 wensihan

the shape of self.example and self.reconstruction should be the same, for cityscapes dataset it should be [1, 512, 1024, 3], which means [batch_size, height, width, channels]

chenxianghu avatar Jun 12 '18 03:06 chenxianghu

I use the dataset of ADE20K which only rescale the width to 512px, is there any change to others parameters except for disabling the sample noise? @chenxianghu

wensihan avatar Jun 12 '18 04:06 wensihan

I modify many places: 1)make my own h5 file, only use 200x200 to 975x975 jpeg images in ADE20K(as the same in the paper) 2)resize image to [512,512], not padding or cropping 3)use tf.image.decode_jpeg, not tf.image.decode_png 4)modify Network.dcgan_generator for adapting to [512,512]

I think it is better that you learn some basic knowledge first and then try to train your own model!

chenxianghu avatar Jun 12 '18 05:06 chenxianghu

@chenxianghu First, thank you very much for your reply, then I still have a question: 200x200 to 975x975 means the images larger or lower than this will be excluded? And then the dataset contains less than 20210 training images, right?

wensihan avatar Jun 12 '18 06:06 wensihan

yes, this is the description of the original paper:

Data sets: We train the proposed method on two popular data sets that come with hand-annotated semantic label maps, namely Cityscapes [42] and ADE20k [43]. Both of these data sets were previously used with GANs [12, 33], hence we know that GANs can model their distribution|at least to a certain extent. Cityscapes contains 2975 training and 500 validation images of dimension 2048 1024px, which we resampled to 1024 512px for our experiments. The training and validation images are annotated with 34 and 19 classes, respectively. From the ADE20k data set we use the SceneParse150 subset with 20 210 training and 2000 validation images of a wide variety of sizes (200200px to 975975px), each annotated with 150 classes. During training, the ADE20k images are rescaled such that the width is 512px.

chenxianghu avatar Jun 12 '18 06:06 chenxianghu

I know this, I just puzzles that does this sentence ( 20 210 training and 2000 validation images of a wide variety of sizes (200�200px to 975�975px)) means the 20210 training images' size vary from 200x200 to 975x975?

wensihan avatar Jun 12 '18 06:06 wensihan

i checked some jpeg image's size are not in the range from 200x200 to 975x975 such as ADE20K\images\training\h\hacienda\ADE_train_00008829.jpg is 1024x768

chenxianghu avatar Jun 12 '18 06:06 chenxianghu

Yes, so I am puzzled... Okay, I know this, the dataset of training is smaller than 20210. Thank you~

wensihan avatar Jun 12 '18 06:06 wensihan

@chenxianghu hi, do you add nosie while training the ADE20K, I came across an error result from the mismatch of the noise's dimension and the encoder network's output. So I wonder if we have to change the method of generating noise. What's more, whether your result is acceptable based on the ADE20K dataset, mine is quite poor and the generator is not convergent after almost 40 epoches.

Jillian2017 avatar Jun 15 '18 07:06 Jillian2017

@Jillian2017 I add nosie while training the ADE20K dataset by modifying Network.dcgan_generator function to adapt 512x512, my generated images quality is also poor after 40 epoches, some generated images even have a strange colorized plaque which doesn't exist in the original images. Do you have this case , I don't know why.

chenxianghu avatar Jun 19 '18 02:06 chenxianghu

@chenxianghu Hi, can you leave a email ? I would like to ask you some questions.

zhiqiang-zhu avatar Jul 12 '18 02:07 zhiqiang-zhu