Yi Su comments

Results 40 comments of


                                            Yi Su

Implement Decision Transformer for offline RL

> Should we first fix tons of issues for RNN? Yes, we should. Who is the right person to do it?

Implement Decision Transformer for offline RL

I tried to add recurrent variant PPO in `atari_ppo.py` [here](https://github.com/nuance1979/tianshou/blob/ppo_lstm_atari/examples/atari/atari_ppo.py) and [here](https://github.com/nuance1979/tianshou/blob/ppo_lstm_atari/examples/atari/atari_network.py#L188). (Ref: [cleanrl's version](https://github.com/vwxyzjn/cleanrl/pull/83/files).) However, of all the Atari games I tried, only Enduro got a reasonable best reward...

Implement Decision Transformer for offline RL

Any update on this? @gogoduan

Improve discrete control offline RL benchmark

I managed to convert a shard of Pong dataset (`Pong/run_1-00000-of-00100`) into `tianshou.data.ReplayBuffer` and saved it to disk in hdf5. However, the size of the hdf5 file is 53GB! As a...

Improve discrete control offline RL benchmark

> Maybe we can add another way of ReplayBuffer save/restore. I remember the compression algorithm from numpy itself is much more efficient than pickle/hdf5 (according to my experiments at that...

Improve discrete control offline RL benchmark

I have a script to convert a shard of RL Unplugged dataset into a `tianshou.data.ReplayBuffer`. Each shard contains about 500k transitions. Now I want to run an experiment with 1M...

Improve discrete control offline RL benchmark

> Not sure what happens, could you please send me the code? Sure. See attachment. I added a `break` here to generate two small buffers with 1000 transitions (otherwise it's...

Improve discrete control offline RL benchmark

> Is it possible to use an empty dataset to reproduce this result? (unrelated to rl-unplugged, because I need quite a long time to download one file...) I made minimum...

Implement MBPO (#16) and REDQ

> Thanks @Trinkle23897 and @nuance1979 for the discussions and helpful suggestions. I reimplemented MBPO in response to the questions. I'll submit another PR to have a clear view as there...

Implement MBPO (#16) and REDQ

> Based on it, I'm thinking of treating REDQ as a base algorithm and rewriting SAC and TD3 to use a critic ensemble instead of critic 1 & 2. Intrinsically,...