JaCoderX
JaCoderX
@Kismuz, differential risk adjusted are based on having some moving average statistic of the previous rewards. So it means that for the first stage (moving average initialization period), we don't...
Tutorial code is a combination of TF-agent `1_dqn_tutorial` and BTgym `unreal_stacked_lstm_strat_4_11` ``` from __future__ import absolute_import from __future__ import division from __future__ import print_function import warnings warnings.filterwarnings("ignore") # suppress h5py...
I can make a notebook tutorial, but because there is some code that need be change for it to work. I thought it would be an issue, this is why...
I opened this issue, partially to share my experience and to record the current limitations. incorrect value function estimation still looks like an open RL research issue. But for sure...
>Implementing SAC would resolve this I think. I'm playing with the idea of giving a try to implement SAC for btgym. It might be a bit of a stretch of...
Ok thanks. I have found this paper [Learning by Playing – Solving Sparse Reward Tasks from Scratch](https://arxiv.org/pdf/1802.10567.pdf) that have an interesting mechanism that add Reward auxiliary tasks. to add small...
@Kismuz on #23 you have replied >Yes, total reward received is usually bigger than final account value (we should see 'broker value' as we suppose all positions will be forcefully...
@Kismuz, I'm revisiting a few of the key elements of the framework and my current focus is on the rewards function. The issue i'm tackling now is the model 'sensitivity'...
@Kismuz , >reward scaling is far better alternative to modifying environment properties Until now I didn't pay too much attention to 'reward scaling' param. but because my models are based...
@frosty00 @kroitor, This PR is pending review for some time