Zhiao Huang

Results 3 issues of Zhiao Huang

In her.ipynb, the target model is not updated during training. The model only learns to maximize one-step reward.

https://github.com/wandb/wandb/blob/0a3b035d0fb206570660275503c8b72f8d7b4399/wandb/sdk/data_types/object_3d.py#L22-L208 The description of the shape of input numpy arrays is quite confusing and not correct.

Please correct me if I am wrong. In the Poplin-P AVG-R example, ```data_dict``` passed to ```train``` function of ```BC_WA_policy.policy_network``` contains the noise parameters searched by CEM and they can add...