Fix gym.spaces iteration in GymWrapper
This is the code mentioned in issue #198.
In GymWrapper:
If self.action_space is e.g. Discrete(2) while action_space is [Discrete(2), Discrete(2), ...], then the above line will break.
Usually, this is masked where the multiple agents have a MultiDiscrete action space and so the zip will unzip self.action_space of MultiDiscrete([2, 5]) into [Discrete(2), Discrete(5)] and iterate on that so that [MultiDiscrete([2, 5]), MultiDiscrete([2, 5])] for action_space will behave well.
Can you give an example of how/when that would happen in Griddly so I know what I'm working with?
I am like 99% sure rllib would break if that were to happen mid-training, but you're right that we should handle that case as well.
So in the case where levels are different sizes but everything is fully-observable then the obs-space will change. Yes this will break RLLib... but other libraries can support this.