policy export error
Hi,
When I use nfsp to train my env, I encountered the following problem.
RuntimeError: Function 'SoftmaxBackward0' returnen nan values in its 0th output
By debugging, I found self.policy(state) outputed 0 in the function of agent.update. Just as the follows show,
Because the part of output is 0, the value of log_probs is inf. In my environment, the definition of observation_space and action_space is as follows: self.observation_space = spaces.Box(low=0, high=1000, shape=(4,), dtype=np.float32) self.action_space = spaces.Discrete(37) Can you give me some suggestions? Thanks
Sorry for late reply.
It's hard to provide constructive suggestions without having more information about your training progress. NFSP algorithm is using DQN as a base agent and iteratively learns an approximate best response against a set of opponent's historical strategies. Please check if learning a single DQN against a fixed policy also reproduces this error.