rl icon indicating copy to clipboard operation
rl copied to clipboard

[BUG]RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6

Open GeekTemo opened this issue 5 months ago • 2 comments

When using torchrl’s SyncDataCollector with a custom environment object, in the _step() method, if the "done" value returns True, an error occurs:

/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:870: UserWarning: total_frames (1000) is not exactly divisible by frames_per_batch (30). This means 20 additional frames will be collected.To silence this message, set the environment variable RL_WARNINGS to False. warnings.warn( /Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:1429: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:38.) traj_ids = traj_ids.masked_scatter(traj_sop, new_traj) Traceback (most recent call last): File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1586, in rollout result = torch.stack( File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function return TD_HANDLED_FUNCTIONS[func](*args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack out.stack_onto(list_of_tensordicts, dim) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto new_dest = torch.stack( File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function return TD_HANDLED_FUNCTIONS[func](*args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack out.stack_onto(list_of_tensordicts, dim) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto new_dest = torch.stack( RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 240, in example2() File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 232, in example2 fine_tunning_model(ds, task_id, model_url_, url_, File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 213, in fine_tunning_model fint_tunning(ft_data, model_path, json_tokenizer_file, save_dir) File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_with_rl_v2.py", line 480, in fint_tunning for epoch, data in enumerate(collector): File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 341, in iter yield from self.iterator() File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1256, in iterator tensordict_out = self.rollout() File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/_utils.py", line 661, in unpack_rref_and_invoke_function return func(self, *args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1594, in rollout result = torch.stack( File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function return TD_HANDLED_FUNCTIONS[func](*args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack out.stack_onto(list_of_tensordicts, dim) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto new_dest = torch.stack( File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function return TD_HANDLED_FUNCTIONS[func](*args, **kwargs) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack out.stack_onto(list_of_tensordicts, dim) File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto new_dest = torch.stack( RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6

Through debugging, I found that when merging the TensorDicts, an error is raised for the key "traj_ids". If the "done" value is False, everything works fine.

GeekTemo avatar Aug 21 '25 09:08 GeekTemo