DI-engine icon indicating copy to clipboard operation
DI-engine copied to clipboard

feature(zms): add new league middlewares and other models and tools.

Open hiha3456 opened this issue 3 years ago • 0 comments

Description

  1. add LeagueCoordinator, LeagueLearnerCommunicator, StepLeagueActor, BattleStepCollector, battle_inferencer, battle_rolloutor, with their corresponding tests.
  2. add BattleTransitionList to gather transitions and cut trajectories for league environment
  3. add dataclass of actor data and learner model
  4. add an attribute return_original_data into EnvSupervisor
  5. add BattleContext into context.py
  6. add EventEnum and add feature in event_loop so that we can add customized string in chosen events.
  7. add utils sparse_logging to logging in a sparse frequency
  8. add my_pickle_loads inside ding/framework/parallel.py so that we could transfer a cuda tensor to a pure-cpu node without bug
  9. add old Storage and FileStorage class used in old dev-league branch inside ding/framework/storage
  10. change a bit player
  11. add sl_branch inside starcraft_player
  12. add old ding/league/v2/base_league.py used in old dev-league branch
  13. add steve, change upgo, vtrace in ding/rl_utils
  14. add detach_grad, flatten in ding/torch_utils/data_helper.py
  15. add l2_distance in ding/torch_utils/metric.py
  16. add GLU2, GatedConvResBlock, scatter_connection_v2, AttentionPool, lstm ding/torch_utils/network/, and make some changes in networks
  17. add read_yaml_config in ding/utils/default_helper.py
  18. and other changes...

Related Issue

TODO

  1. delete old checkpoints saved by LeagueLearnerCommunicator, because it can consume disk storage very quickly
  2. When we use multiple envs inside env_manager(or other variants), in some cases only some of the envs work properly, so for the return timesteps of step(action), I hugely recommend that, all the EnvManagers need to return the timesteps in format dict instead of list. As far as I know, the return of EnvSupervisor and BaseEnvManagerV2 is dict, and the returns subprocesEnvManager and BaseEnvManager is list.
  3. The BattleCollector now could not handle the case when the policy has intermediate state, for example, the policy of SC2(which in DI-star) maintain a huge amount of intermediate state. In this case, each env should maintain number of players policies, which is not the case in current BattleCollector. The current BattleCollector can only handle this kind of policy when EnvManager has only one environment.
  4. add middleware of teacher model.

Check List

  • [ ] merge the latest version source branch/repo, and resolve all the conflicts
  • [ ] pass style check
  • [ ] pass all the tests

hiha3456 avatar Aug 26 '22 07:08 hiha3456