agents Random_TF_Environment Bug

Hi there, I think I've found a minor bug in the Random_TF_Enviroment class.

I've discovered that it fails when the time_step_spec includes a reward_spec that is a multi-dimensional array (as opposed to a scalar).

I believe this is due to the code on line 91, where under the _reset method, it calls ts.restart without specifying the reward_spec. In this situation, ts.restart assumes the reward_spec is a single scalar which causes the error

Suggestion: include reward = self.time_step_spec.reward when ts.restart is called.

Apr 20 '21 11:04 ngroves08

Thanks! This is a great candidate for a first contribution. Can you send a PR with a small unit test?

Apr 20 '21 18:04 ebrevdo

Happy to help - can you point me in the direction of a good example of a unit test. This is all new to me...

Apr 21 '21 11:04 ngroves08

You can look at the existing test case and add a new test in:

https://github.com/tensorflow/agents/blob/master/tf_agents/environments/random_tf_environment_test.py#L93

You will probably have to add a new __init__ param to the base TFEnvironment for the reward spec, modify the reward_spec property, and then pipe it through the RandomTFEnvironment

Non scalar reward specs are something that we added not too long ago for a very specific project and unfortunately the test coverage and usability isn't piped everywhere. Contributions here would be awesome!

Apr 21 '21 16:04 oars

I've created pull requests for both RandomTFEnvironment and RandomTFEnvironmentTest

I've also run python setup.py test and it passed all the relevant tests. (Around 5 tests failed claiming it couldn't find libpython3.7m.so.1.0, but this also occurred when I ran the test before making any changes)

I didn't update TFEnvironment since it is already passed the reward_spec as part of the time_step_spec

Let me know if this is right

Apr 24 '21 11:04 ngroves08