PPO Optimizer is incorrect ?
Hi.
Not really. The PPO variant in the RLMatrix considers 'batchLength' to be counted per episode not per step. This is to help with mental math when avoiding changing policy mid-episode.
@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?
@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?
No. Agent will wait to collect 64 episodes after which it will use the recorded buffer for n epochs and discard the data.
In my case an episode considered the full data cycle. As there is no early stop due to rewards or anything else, so it goes over al my financial time series dataset . When set batch size 1 it optimises each step. NumEpisodes determined as return memory.Count(x => x.nextState == null); So the nextState will be null only at the data end.
@grinay You can join the discord and we can have a discussion there.
If you prefer to talk here could you clarify? You have a single step episode? Where the input is a concatenated financial time series and there is a decision to be made at the end? I suppose buy/sell?