Projection Network for more than 1 action with differing action spaces
Following the discussion from #37, I developed a MultiCategoricalProjectionNetwork that splits logits and creates the respective Categorical distribution. I tried to adhere to the same pattern as far as I could; It can be found here: https://gist.github.com/sidney-tio/66abada949f1b629dd9ee28777d402d5
If the team would like, I could raise a PR based on the gist I developed. From what I see, these are the to-dos to make it PR-worthy:
- [x] add tests
- [x] add more detailed docstrings
- [x] add masks
Have you tried instead having a nested action space? In which each action can have different number of actions?
No, I don't think I have tried that. I assume you are referring to something like a gym.spaces.Dict type of nested structure where we could specify {'action1': 4, 'action2': 3}? Could you elaborate further?
Yeah you can use gym.spaces.Dict or directly nested ArraysSpecs to define the actions. Then each one can have their own Categorical distribution and sampling will sample all of them.
my current workflow was to generate a spec from a gym.spaces.MultiDiscrete instance before creating the network.
I can see why something like a nested action space would be useful. I also just tried from a gym.spaces.Dict; would need loop through the iterable before generating the respective Categorical distributions.
i'll add a function to check for iterable and extract the relevant information. let me make the changes and, if its okay, I will raise a draft PR
hello, not sure if it was missed, but the PR for this issue is up. Could I request for a review please?