RL_Matrix icon indicating copy to clipboard operation
RL_Matrix copied to clipboard

PPO Optimizer is incorrect ?

Open grinay opened this issue 1 year ago • 5 comments

Image Hi. Thanks for making this product, very appreciated it. Take a look into screen, I think there is a mistake. I'm not hurry to make PR

grinay avatar Apr 04 '25 13:04 grinay

Hi.

Not really. The PPO variant in the RLMatrix considers 'batchLength' to be counted per episode not per step. This is to help with mental math when avoiding changing policy mid-episode.

asieradzk avatar Apr 04 '25 14:04 asieradzk

@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?

grinay avatar Apr 04 '25 14:04 grinay

@asieradzk do I understand correct, if my batchSize is 64, it goes over all data 64 times before apply first model optimization?

No. Agent will wait to collect 64 episodes after which it will use the recorded buffer for n epochs and discard the data.

asieradzk avatar Apr 04 '25 14:04 asieradzk

In my case an episode considered the full data cycle. As there is no early stop due to rewards or anything else, so it goes over al my financial time series dataset . When set batch size 1 it optimises each step. NumEpisodes determined as return memory.Count(x => x.nextState == null); So the nextState will be null only at the data end.

grinay avatar Apr 04 '25 15:04 grinay

@grinay You can join the discord and we can have a discussion there.

If you prefer to talk here could you clarify? You have a single step episode? Where the input is a concatenated financial time series and there is a decision to be made at the end? I suppose buy/sell?

asieradzk avatar Apr 04 '25 16:04 asieradzk