libowen424
libowen424
i success on the following configuration: ` set -x export PATH=$HOME/.local/bin/:$PATH ray job submit --address="http://127.0.0.1:8265" \ --runtime-env-json='{"working_dir": "/openrlhf", "pip": "/openrlhf/requirements.txt"}' \ -- python3 examples/train_ppo_ray.py \ --ref_num_nodes 1 \ --ref_num_gpus_per_node 1...
actually, i find that, `self.model.zero_grad()` `loss.backward()` `data_grad = data.grad.data ` but data_grad is nan, your code is optimize the model, not the `data` it's wired
> thanks, i will try to find a way to solve this too. Perhaps it's actually due to the limitation of framework-level implementation.