nicklhy
nicklhy
I noticed that the current shuffle operation use `temp_blob_` as the intermediate variable for memory copy. Why can't we just set top_data as the "output" argument of `Resize_gpu`?
Thanks a lot for sharing this awesome project ! I really want to have a try for the training part, but didn't find any experiment details in README or your...
`AgentPPOHterm`类的`update_net`函数,会解析之前在`train_and_evaluate`里面通过`AgentPPO.explore_one_env`生成的trajectory(并且由于PPO是on_policy算法,所以并不会把trajectory放入replay buffer然后重新采样),但可以看到,`AgentPPOHterm`类的`update_net`函数解析trajectory数据格式和存放时并不一致(和`AgentPPO`类的`update_net`函数也不一致),看起来是个bug AgentPPOHterm类`update_net`函数里的trajectory解析: https://github.com/AI4Finance-Foundation/ElegantRL/blob/1d5bf9e1639222c5d2a462adcc0c4eab453bbe70/elegantrl/agents/AgentPPO.py#L671 AgentPPO类`explore_one_env`函数里输出的trajectory格式: https://github.com/AI4Finance-Foundation/ElegantRL/blob/1d5bf9e1639222c5d2a462adcc0c4eab453bbe70/elegantrl/agents/AgentPPO.py#L92 AgentPPO类`update_net`函数里的trajectory解析: https://github.com/AI4Finance-Foundation/ElegantRL/blob/1d5bf9e1639222c5d2a462adcc0c4eab453bbe70/elegantrl/agents/AgentPPO.py#L139 除了这个解析顺序的问题,`AgentPPOHterm`类`update_net`需要的两个字段`buf_mask`和`buf_noise`似乎不能直接与`undones`和`logprobs`对应?不知道是不是代码实现上还没完成?
目前在文档中看到本项目实现了非常丰富的智能体模型算法,以及不同类型Env的适配,但是好像具体的benchmark试验结果汇总比较有限,存在大量的结果缺失,例如[Atari](https://xuance.readthedocs.io/zh/latest/documents/benchmark/atari.html)、MPE、MAgent等均无试验结果展示,仅有的Mujoco试验结果也不是很完整,仅包含DDPG、TD3、A2C、PPO四个算法,而且没有区分PyTorch、TF、MindSpore不同底层框架实现。
## Habitat-Lab and Habitat-Sim versions Habitat-Lab: 0.3.1 Habitat-Sim: 0.3.1 ## 🐛 Bug I tried to run rearrange/rl_hierarchical_oracle_nav_human.yaml. But the script just crashed with file not found errors. After looking into...
### Is there an existing issue for the same bug? - [X] I have checked the existing issues. ### Branch name main ### Commit ID e1e5711680d64054233c8b96845f1f4eec7e46d4 ### Other environment information...