Question: Why not reanalyze 100% policy targets?

Open Hwhitetooth opened this issue 3 years ago • 1 comments

Hi there,

First of all, great work and thank you for opensourcing your code!

I have a question regarding reanalyze: you chose to reanalyze 99% of policy targets and 100% of value targets. I am just curious about the reason behind this choice. Did you try reanalyzing 100% of the policy targets? Did it hurt the performance?

Thank you!

Feb 19 '22 23:02 Hwhitetooth

A larger ratio of reanalyzing can make training more efficient.

Actually, there is no significant difference between 99% of reanalyzing targets and 100% of reanalyzing targets since 99% and 100% are close enough.

In DeepMind's paper MuZero Unplugged Online and Offline Reinforcement Learning by Planning with a Learned Model, they discussed the mechanism and efficiency of Reanalysing in detail. If you are interested, please refer to this work.

Hope this can help you:)

Apr 29 '22 02:04 YeWR