feature(zms): add new league middlewares and other models and tools.

Open hiha3456 opened this issue 3 years ago • 0 comments

Description

add LeagueCoordinator, LeagueLearnerCommunicator, StepLeagueActor, BattleStepCollector, battle_inferencer, battle_rolloutor, with their corresponding tests.
add BattleTransitionList to gather transitions and cut trajectories for league environment
add dataclass of actor data and learner model
add an attribute return_original_data into EnvSupervisor
add BattleContext into context.py
add EventEnum and add feature in event_loop so that we can add customized string in chosen events.
add utils sparse_logging to logging in a sparse frequency
add my_pickle_loads inside ding/framework/parallel.py so that we could transfer a cuda tensor to a pure-cpu node without bug
add old Storage and FileStorage class used in old dev-league branch inside ding/framework/storage
change a bit player
add sl_branch inside starcraft_player
add old ding/league/v2/base_league.py used in old dev-league branch
add steve, change upgo, vtrace in ding/rl_utils
add detach_grad, flatten in ding/torch_utils/data_helper.py
add l2_distance in ding/torch_utils/metric.py
add GLU2, GatedConvResBlock, scatter_connection_v2, AttentionPool, lstm ding/torch_utils/network/, and make some changes in networks
add read_yaml_config in ding/utils/default_helper.py
and other changes...

delete old checkpoints saved by LeagueLearnerCommunicator, because it can consume disk storage very quickly
When we use multiple envs inside env_manager(or other variants), in some cases only some of the envs work properly, so for the return timesteps of step(action), I hugely recommend that, all the EnvManagers need to return the timesteps in format dict instead of list. As far as I know, the return of EnvSupervisor and BaseEnvManagerV2 is dict, and the returns subprocesEnvManager and BaseEnvManager is list.
The BattleCollector now could not handle the case when the policy has intermediate state, for example, the policy of SC2(which in DI-star) maintain a huge amount of intermediate state. In this case, each env should maintain number of players policies, which is not the case in current BattleCollector. The current BattleCollector can only handle this kind of policy when EnvManager has only one environment.
add middleware of teacher model.

Aug 26 '22 07:08 hiha3456