Zhiao Huang
Results
2
comments of
Zhiao Huang
> > In her.ipynb, the target model is not updated during training. The model only learns to maximize one-step reward. > > model update had been included in `compute_td_error()` I...
> > > > In her.ipynb, the target model is not updated during training. The model only learns to maximize one-step reward. > > > > > > > >...