Recommended change on sac as well as how to inherit a policy

Open ZHZisZZ opened this issue 3 years ago • 0 comments

Change how we transform a distribution. For example, https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py#L816-L822, can be changed to

dist = TransformedDistribution(Independent(Normal(mu, sigma), 1), [TanhTransform()]) 
next_action = dist.rsample()
next_log_prob = dist.log_prob()

This is much easier and more importantly, more numerically stable. 2. I also recommend the practice in mbsac when creating variants of a policy. A lot of copies in configs and __init__learn are not necessary (for example, in SQILSACPolicy) 3. There are two sac papers sac-v1 and sac-v2 from the same search group, I think we should include links to both papers since it is sac-v2 that proposes automatic entropy adjustment. 4. Maybe we can delete value_network instead of hardcoding value_network=False, which is not commonly used and not even used in sac-v2. 5. Maybe we can create a subdirectory for sac instead of squeezing every variant of sac in a single file. Since sac is a very good commonly-used baseline. If a subdirectory is not necessary, at least we should move SACPolicy to the top instead of SACDiscretePolicy, which is not commonly used.

Jun 24 '22 02:06 ZHZisZZ