ganhacks icon indicating copy to clipboard operation
ganhacks copied to clipboard

for GAN ,why my D loss is increse,and G loss decrease to 0 at the begining

Open hefeiwangyande opened this issue 8 years ago • 10 comments

The generated picture is noise. step: 4650,G_loss_adv: 0.325, G_accuracy: 0.984,
D_loss_adv: 0.982, d_loss_pos: 0.598, d_loss_neg: 1.366,
D_accuracy: 0.258, d_pos_acc: 0.500, d_neg_acc: 0.016 my G_loss less than D_loss and generated samples score significantly higher than the real picture, D is completely abnormal (normal D_loss is small, D can distinguish true and false it?), my D structure using four convs + fullly connected, I do not know Why do you make a mistake?

hefeiwangyande avatar Jan 12 '18 13:01 hefeiwangyande

Please fix your message, it is not readable.

And I doubt anyone will spend hours trying to debug your code, please come with a precise question.

DEKHTIARJonathan avatar Jan 12 '18 13:01 DEKHTIARJonathan

    G_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')

    d_loss_pos = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_real_logit, labels=tf.ones_like(d_real_logit)), name='d_loss_real')
    d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(        
     logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
    D_loss_adv = tf.add(.5 * d_loss_pos, .5 * d_loss_neg, name='d_loss')

    # about accuracy
    d_pos_acc = tf.reduce_mean(tf.cast(score_real > 0.5, tf.float32), name='accuracy_real')
    d_neg_acc = tf.reduce_mean(tf.cast(score_fake < 0.5, tf.float32), name='accuracy_fake')
    d_accuracy =tf.add(.5 * d_pos_acc, .5 * d_neg_acc, name='accuracy')

    g_accuracy = tf.reduce_mean(tf.cast(score_fake > 0.5, tf.float32), name='accuracy')

hefeiwangyande avatar Jan 13 '18 01:01 hefeiwangyande

In your implementation is looks like d_loss_fake should be different from g_loss_adv.

Assuming that : 1) G is the Generator outputting a fake image from a noise vector z 2) D is the discriminator and that it outputs the probability that the input is real:

one gets: g_loss_adv = D(G(z)) and d_loss_fake = 1 - D(G(z))

rafaelvalle avatar Jan 13 '18 01:01 rafaelvalle

@DEKHTIARJonathan Thanks for your suggestion, I have changed the information.

hefeiwangyande avatar Jan 13 '18 02:01 hefeiwangyande

@rafaelvalle Your suggestion is: d_loss_neg = tf.reduce_mean(1-tf.nn.sigmoid_cross_entropy_with_logits( logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

Actually, d_fake_logit=D(G(z)) in my implemented, Input noise z through by D, its value should be relatively small close to 0, so I think d_loss_neg is not wrong or my understanding wrong?

hefeiwangyande avatar Jan 13 '18 02:01 hefeiwangyande

Let's look at the positive (real) and negative (adversarial) losses one by one. Consider D outputs the probability of the input being real.

A) if d_loss_pos is minimized using D(x) and labels for x are 1, D minimizes its loss by trying to make D(x) closer to 1. B) if d_loss_neg is minimized D(G(z)) and labels for labels for G(z) are 0, D minimizes its loss by trying to make (D(G(z)) closer to 0.

Your problem could be that 1) the labels for x and G(z) are the same instead of 1 and 0 respectively. 2) If that's not the problem, it could be that using D(G(z)) has vanishing gradients early on and people prefer to use 1 - D(G(z)).

Now let's assume you have 1 and 2 correct, where else could the problem be? Look that in your code below the generator and the discriminator have the same function to minimize. This is not correct as they should minimize different loss functions. That's why I suggested chancing g_loss_adv to 1 - d_fake_logit

    g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
    d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(        
     logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

rafaelvalle avatar Jan 15 '18 16:01 rafaelvalle

@rafaelvalle I'm a little bit understand your meaning, you mean that my D (G (Z)) is too large to reduce may be due to the disappearance of the gradient, so choose to maximize the 1-D (G (Z))?

In addition, d_loss_neg, g_loss_adv parameters are not exactly the same g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss') d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

hefeiwangyande avatar Jan 16 '18 02:01 hefeiwangyande

Oh, I missed the ones_like, zeros_like! Sorry for not reading carefully. There are many things that could be the reason :

  1. Loss function Try using 1 - D(G(z)) instead
  2. Time Wait for a few iterations until training converges to a specific behavior, for example generator always wins.
  3. Learning rates Try adjusting such that the loosing part has higher learning rate
  4. Discriminator number of iterations vs Generator number of iterations Try adjusting such that the loosing part has more iterations
  5. Weight initialization Try Xavier Uniform with gain set according to non-linearity
  6. Noise vector Try using uniform noise instead of normal noise
  7. Model capacity Try increasing the loosing part's capacity

Report here if you find what the problem was such that we all learn.

rafaelvalle avatar Jan 16 '18 16:01 rafaelvalle

I can also use: Label Smoothing for the discriminator to weaken it.

I have submit a PR request on TF to help you implementing this feature: https://github.com/tensorflow/tensorflow/pull/16153

You can get inspiration to write your own custom code

DEKHTIARJonathan avatar Jan 17 '18 08:01 DEKHTIARJonathan

I encounter the same issue, and finally found that I ignore to use ONLY the gradient of each discriminator part or generator part. If you don't add params like var_list=generator_vars, the optimizer will weaken discriminator's ability by update it's parameters.

discriminator_vars =  [var for var in tf.global_variables() if  "discriminator" in var.name]
generator_vars =  [var for var in tf.global_variables() if  "generator" in var.name]

self.D_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.D_loss, var_list=discriminator_vars)
self.G_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.G_loss, var_list=generator_vars)

danielkaifeng avatar Sep 12 '19 06:09 danielkaifeng