DI-engine
DI-engine copied to clipboard
feature(zms): add new league middlewares and other models and tools.
Description
- add LeagueCoordinator, LeagueLearnerCommunicator, StepLeagueActor, BattleStepCollector, battle_inferencer, battle_rolloutor, with their corresponding tests.
- add BattleTransitionList to gather transitions and cut trajectories for league environment
- add dataclass of actor data and learner model
- add an attribute return_original_data into EnvSupervisor
- add BattleContext into context.py
- add EventEnum and add feature in event_loop so that we can add customized string in chosen events.
- add utils sparse_logging to logging in a sparse frequency
- add my_pickle_loads inside ding/framework/parallel.py so that we could transfer a cuda tensor to a pure-cpu node without bug
- add old Storage and FileStorage class used in old dev-league branch inside ding/framework/storage
- change a bit player
- add sl_branch inside starcraft_player
- add old ding/league/v2/base_league.py used in old dev-league branch
- add steve, change upgo, vtrace in ding/rl_utils
- add detach_grad, flatten in ding/torch_utils/data_helper.py
- add l2_distance in ding/torch_utils/metric.py
- add GLU2, GatedConvResBlock, scatter_connection_v2, AttentionPool, lstm ding/torch_utils/network/, and make some changes in networks
- add read_yaml_config in ding/utils/default_helper.py
- and other changes...
Related Issue
TODO
- delete old checkpoints saved by LeagueLearnerCommunicator, because it can consume disk storage very quickly
- When we use multiple envs inside env_manager(or other variants), in some cases only some of the envs work properly, so for the return timesteps of step(action), I hugely recommend that, all the EnvManagers need to return the timesteps in format dict instead of list. As far as I know, the return of EnvSupervisor and BaseEnvManagerV2 is dict, and the returns subprocesEnvManager and BaseEnvManager is list.
- The BattleCollector now could not handle the case when the policy has intermediate state, for example, the policy of SC2(which in DI-star) maintain a huge amount of intermediate state. In this case, each env should maintain number of players policies, which is not the case in current BattleCollector. The current BattleCollector can only handle this kind of policy when EnvManager has only one environment.
- add middleware of teacher model.
Check List
- [ ] merge the latest version source branch/repo, and resolve all the conflicts
- [ ] pass style check
- [ ] pass all the tests