Rémy Portelas

Results 11 comments of Rémy Portelas

Hmm right right. Thanks for the input. Then we could use a dedicated random state created from the original seed: ```python rnd_state = np.random.RandomState(self.config.seed + self.rank) envs = [self.config.new_game(rnd_state.randint(10**9)) for...

Hello, First of all, thank you for releasing this much-needed open-source MuZero implementation :). ### Strengthening the relevance of this reproducibility issue Here are my performance results on CrazyClimber, 4...

Hello @yix081, The results you are showing are similar to those reported by authors (well played!). Did you obtain them using the current version of the codebase, without additional modifications...

Hello @szrlee @yix081 , Yes I would be happy to share my modifications, and collaborate on improving it :) I contacted the authors by email to get their permission to...

Hello @szrlee @yix081, I contacted the authors: they would like to look at this script by themselves (ASAP) to find the exact bug instead of releasing a half-baked reanalyze script....

The X axis corresponds to training steps (not environment steps). My experiments were scheduled to run 900k training steps while performing 30M environment steps (I stopped them at around 600k)....

Thanks for your suggestions :). I already tried to add periodic ```gc.collect()``` , which did not solve the issue. For your other suggested modifications, could you tell me a bit...

> but the change on codes relevant to `target_weights` makes the `train.sh` be runnable. Hmm interesting. Could it be just because you never get to load the target weights in...

### Strengthening the relevance of @emailweixu reproducibility issue Here are my performance results on Freeway, 4 seeds: ![freeway_4seeds](https://user-images.githubusercontent.com/20358586/166712489-d94c2c2c-7ccd-460a-a5f5-b39842806d03.png) The 4 seeds obtained a score of 0 by the end of...

@emailweixu It is true that Freeway is challenging in terms of exploration, however in both the EfficientMuzero paper and the [original Muzero paper](https://arxiv.org/pdf/1911.08265.pdf) (check Table S1 in appendix), non-zero performance...