Deep_reinforcement_learning_Course icon indicating copy to clipboard operation
Deep_reinforcement_learning_Course copied to clipboard

who do struggle with tf.nn.softmax_cross_entropy_with_logits_v2 in Cartpole REINFORCE Monte Carlo Policy Gradients

Open gekator opened this issue 3 years ago • 0 comments

Guys, if you struggle with neg_log_prob = tf.nn.softmax_cross_entropy_with_logits_v2(logits = fc3, labels = actions) in n Cartpole REINFORCE Monte Carlo Policy Gradients. I killed some time to understand what is happening there You can change code as bellow:

y_hat_softmax = tf.nn.softmax(fc3)

y_cross = actions * tf.log(y_hat_softmax)

neg_log_prob = - tf.reduce_sum(y_cross, 1)

loss = tf.reduce_mean(neg_log_prob * discounted_episode_rewards_)

also change actions = tf.placeholder(tf.float32, [None, action_size], name="actions")

gekator avatar Dec 08 '22 16:12 gekator