agents icon indicating copy to clipboard operation
agents copied to clipboard

Can tf.agent policy return probability vector for all actions?

Open bing-zhao opened this issue 5 years ago • 5 comments

I am trying to train a Reinforcement Learning agent using TF-Agent TF-Agent DQN Tutorial. In my application, I have 9 discrete actions (labeled 0 to 8), and I would like to get the probability vector contains all actions calculated by the trained policy, and do further processing in other application environments. However, the policy only returns log_probability with a single value rather than a vector for all actions. Is there anyway to get the probability vector?

from tf_agents.networks import q_network
from tf_agents.agents.dqn import dqn_agent

q_net = q_network.QNetwork(
            env.observation_spec(),
            env.action_spec(),
            fc_layer_params=(32,)
        )

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.001)

my_agent = dqn_agent.DqnAgent(
    env.time_step_spec(),
    env.action_spec(),
    q_network=q_net,
    epsilon_greedy=epsilon,
    optimizer=optimizer,
    emit_log_probability=True,
    td_errors_loss_fn=common.element_wise_squared_loss,
    train_step_counter=global_step)

my_agent.initialize()

...  # training

tf_policy_saver = policy_saver.PolicySaver(my_agent.policy)
tf_policy_saver.save('./policy_dir/')

# making decision using the trained policy
action_step = my_agent.policy.action(time_step)

In dqn_agent.DqnAgent(), I set emit_log_probability=True, which is supposed to define Whether policies emit log probabilities or not.

However, when I run action_step = my_agent.policy.action(time_step), it returns PolicyStep(action=<tf.Tensor: shape=(1,), dtype=int64, numpy=array([1], dtype=int64)>, state=(), info=PolicyInfo(log_probability=<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32)>))

I also tried to run action_distribution = saved_policy.distribution(time_step), It returns PolicyStep(action=<tfp.distributions.DeterministicWithLogProbCT 'Deterministic' batch_shape=[1] event_shape=[] dtype=int64>, state=(), info=PolicyInfo(log_probability=<tf.Tensor: shape=(), dtype=float32, numpy=0.0>))

If there is no such API available in TF.Agent, is there a way to get such probability vector? Thanks.

bing-zhao avatar Aug 24 '20 06:08 bing-zhao

Ummm could you print out env.action_spec() for your environment?

summer-yue avatar Aug 26 '20 19:08 summer-yue

Ummm could you print out env.action_spec() for your environment?

Thanks for the response. The output for env.action_spec() is as below BoundedTensorSpec(shape=(), dtype=tf.int64, name='action', minimum=array(0, dtype=int64), maximum=array(8, dtype=int64))

By the way, is there a way to return the Q-value for different actions? If that is possible, maybe I can just pass the Q-value into a softmax function and get the probabilities?

bing-zhao avatar Aug 27 '20 01:08 bing-zhao

Hello,Bing. I have the same problem as you , could you tell me how you solved it , please.

David-zreo avatar May 17 '21 03:05 David-zreo

Hello @bing-zhao I am also facing same issue. Did you got any solution to get Q-values?

apurva-octro avatar Jun 03 '22 07:06 apurva-octro

might be helpful: go to the greedy_policy.py, and find the function def _distribution(self, time_step, policy_state) Where we find that this function returns DeterministicWithLogProb(loc=greedy_action) where greedy_action = dist.mode(), and that is why it is always a binary sequences. If you want probabilities for each action, dist.prob(that action) is what you need.

FalsitaFine avatar Mar 28 '23 23:03 FalsitaFine