Mark Sinton

Results 4 comments of Mark Sinton

Yes it is based on the expected cumulative reward from the environment. For Pendulum, we can never get positive rewards so its easy to set Vmax to 0. To find...

Alternatively (or as well as), if you could release Python wheels for the package alongside the source distributions that would also resolve this.

@shanmsac Would you mind reviewing this minor change please?

@akidambisrinivasan @brendan-p-lynch @furq-aws Tagging you as recent reviewers on this repo, would one of you mind taking a look at this change please?