RL_Matrix PPO Optimizer is incorrect ?

Hi. Thanks for making this product, very appreciated it. Take a look into screen, I think there is a mistake. I'm not hurry to make PR

Apr 04 '25 13:04 grinay

Hi.

Not really. The PPO variant in the RLMatrix considers 'batchLength' to be counted per episode not per step. This is to help with mental math when avoiding changing policy mid-episode.

Apr 04 '25 14:04 asieradzk

@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?

Apr 04 '25 14:04 grinay

@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?

No. Agent will wait to collect 64 episodes after which it will use the recorded buffer for n epochs and discard the data.

Apr 04 '25 14:04 asieradzk

In my case an episode considered the full data cycle. As there is no early stop due to rewards or anything else, so it goes over al my financial time series dataset . When set batch size 1 it optimises each step. NumEpisodes determined as return memory.Count(x => x.nextState == null); So the nextState will be null only at the data end.

Apr 04 '25 15:04 grinay

@grinay You can join the discord and we can have a discussion there.

If you prefer to talk here could you clarify? You have a single step episode? Where the input is a concatenated financial time series and there is a decision to be made at the end? I suppose buy/sell?

Apr 04 '25 16:04 asieradzk