Zhiao Huang

Results 2 comments of Zhiao Huang

> > In her.ipynb, the target model is not updated during training. The model only learns to maximize one-step reward. > > model update had been included in `compute_td_error()` I...

> > > > In her.ipynb, the target model is not updated during training. The model only learns to maximize one-step reward. > > > > > > > >...