Description

This MR ports the code from @lorenwel and adds the concept of reward groups to the code. In the manager-based workflow, the reward terms can now be grouped together, where the computations for different groups can be done independently. This is useful in multi-agents scenarios, where you have to compute different rewards for different agents.

The reward manager still supports the previous case where no groups are defined.

Type of change

New feature (non-breaking change which adds functionality)

Checklist

[ ] I have run the pre-commit checks with ./isaaclab.sh --format
[ ] I have made corresponding changes to the documentation
[ ] My changes generate no new warnings
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have updated the changelog and the corresponding version in the extension's config/extension.toml file
[ ] I have added my name to the CONTRIBUTORS.md or my name already exists there

Jul 25 '24 15:07 Dhoeller19

Thanks for porting and cleaning this up. I just checked this PR against our internal up-to-date version of #221 and just judging by the git diff, there doesn't seem to be a functional difference. So all good from my side.

Jul 25 '24 15:07 lorenwel

@lorenwel @Mayankm96 @jsmith-bdai

I appreciate the effort in implementing this feature. I’ve been using it for my manager-based multi-agent setup, and it has been quite helpful.

I have a suggestion: it might be beneficial to extend this to the termination manager as well. Specifically, accessing termination_manager.terminated and termination_manager.timeout should return a dictionary. While timeout will remain the same across agents due to a single episode length, the terminated signal is crucial for identifying which specific agent or robot has met all termination conditions.

This change would enable computing returns more accurately. Currently, a single tensor indicates termination for all agents, even if only one agent has terminated, introducing potential bias in return calculations. However, this wouldn’t require modifying the logic for computing done in the manager-based RL environment, it would still be true if at least one agent has been terminated.

Overall, this feature would be particularly useful for correctly computing returns in multi-agent scenarios. Let me know your thoughts!

Feb 21 '25 19:02 bikcrum

Adds reward groups to the reward manager

Description

Type of change

Checklist