蒲源 issues

Results 41 issues of


                                            蒲源

feature(pu): add efficientzero and related modules

add efficientzero policy and model, and related env and config, and migrate the existing alphazero demo ## Description ## Related Issue ## TODO ## Check List - [ ] merge...

algo

Question about the effect of torch_amp

Hi, First of all, thank you for opensourcing your nice code! I have a question regarding the effect of torch_amp: I test the training process of EfficientZero when using and...

feature(pu): add output_activation, output_norm_type, last_linear_layer_init_zero option for MLP

## Description add output_activation, output_norm_type, last_linear_layer_init_zero option for MLP ## Related Issue ## TODO ## Check List - [ ] merge the latest version source branch/repo, and resolve all the...

enhancement

feature(pu): add modified gym-hybrid including moving, sliding and hardmove env

## Description add modified gym-hybrid including moving, sliding and hardmove env ## Related Issue ## TODO ## Check List - [ ] merge the latest version source branch/repo, and resolve...

env

Question about the perspective transformation of two players when calculating Q?

Thanks for you open-sourced code very much. I am very confused about this code segment in [backpropagate](https://github.com/werner-duvaud/muzero-general/blob/master/self_play.py#L406) method in self_play.py: when len(self.config.players) is 2, - in line [423](https://github.com/werner-duvaud/muzero-general/blob/master/self_play.py#L423)： `min_max_stats.update(node.reward +...

Question about the effect of discount factor and done mask when calculating the target value?

Thanks for your open-sourced code very much. This is a common definition of an target value in classical RL: I'm a little confused about the way of calculating target value...

Question about the effect of state encoding indentity connection in dynamics network

Thanks for your open-sourced code very much. I'm a little confused about the reason for the identity connection of state encoding in [DynamicsNetwork](https://github.com/YeWR/EfficientZero/blob/main/config/atari/model.py#L252) in model.py: Why do we add this...

Question about the index of pad_child_visits_lst in selfplay_worker.py

Thanks for you open-sourced code very much. I am very confused about this code segment in [put_last_trajectory](https://github.com/YeWR/EfficientZero/blob/main/core/selfplay_worker.py#L69) method in selfplay_worker.py: In [Line 69](https://github.com/YeWR/EfficientZero/blob/main/core/selfplay_worker.py#L69) , why is, ` pad_child_visits_lst = game_histories[i].child_visits[beg_index:end_index]`...

WIP: UniZero: Generalized and Efficient Planning with Scalable World Models

- Our work is currently focused on developing a unified and scalable planning framework. - Our code is partially based on https://github.com/eloialonso/iris.

enhancement

algorithm

discussion

research

feature(pu): add Go env, AlphaZero ctree and league training

- add go_env, related unittest - add go mcts bot and alphazero/muzero config - add league version of alphazero - add ctree version of alphazero

enhancement

environment

algorithm