Question about the transform between true reward and value prefix

Open timothijoe opened this issue 3 years ago • 1 comments

Hi, I was a little confused about how to get true reward from value prefix in core/ctree/cnode.cpp

For the function update_tree_q() in Line 256, the true reward is calculated by float true_reward = node->value_prefix - parent_value_prefix

Suppose we have a root node_1, with its two child (node_2 and node_3),

Before the while loop, we push node_1 into the node_stack; For the first time of the while loop, we pop node_1, and push node_2, node_3 into the node_stack, finally we set parent_value_prefix = node_1.value_prefix;

For the second time of the while loop, we pop node_3, (suppose there is no child of node_3 expanded), and we set parent_value_prefix=node3.value_prefix (Line281);

In the third time of the while loop, we pop node_2, when we calc the true reward of node_2 in Line 266, true_reward = node_2.value_prefix - parent_value_prefix = node_2.value_prefix - node_3.value_prefix,

However, the parent of node_2 is node_1, so the true_reward should be node_2.value_prefix - node_1.value_prefix So I wonder if there is some problem for the operation for the variable "parent_value_prefix", or I misunderstood the code.

Alhough, in function update_tree_q, we only update the min_max value, so it may not affect the convergence. I wonder if it will convergence faster if there the operation is fixed.

Oct 25 '22 07:10 timothijoe

Thank you for your correction!

You are right. It is a bug that results in wrong min/max values on the tree side. Really thank you for your detailed reading. And I think it will affect the convergence or stability or something else.

We will fix this these days and check out the performance :)

Oct 31 '22 03:10 YeWR