Random_TF_Environment Bug
Hi there, I think I've found a minor bug in the Random_TF_Enviroment class.
I've discovered that it fails when the time_step_spec includes a reward_spec that is a multi-dimensional array (as opposed to a scalar).
I believe this is due to the code on line 91, where under the _reset method, it calls ts.restart without specifying the reward_spec. In this situation, ts.restart assumes the reward_spec is a single scalar which causes the error
Suggestion: include reward = self.time_step_spec.reward when ts.restart is called.
Thanks! This is a great candidate for a first contribution. Can you send a PR with a small unit test?
Happy to help - can you point me in the direction of a good example of a unit test. This is all new to me...
You can look at the existing test case and add a new test in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/random_tf_environment_test.py#L93
You will probably have to add a new __init__ param to the base TFEnvironment for the reward spec, modify the reward_spec property, and then pipe it through the RandomTFEnvironment
Non scalar reward specs are something that we added not too long ago for a very specific project and unfortunately the test coverage and usability isn't piped everywhere. Contributions here would be awesome!
I've created pull requests for both RandomTFEnvironment and RandomTFEnvironmentTest
I've also run python setup.py test and it passed all the relevant tests. (Around 5 tests failed claiming it couldn't find libpython3.7m.so.1.0, but this also occurred when I ran the test before making any changes)
I didn't update TFEnvironment since it is already passed the reward_spec as part of the time_step_spec
Let me know if this is right