openpi Droid Dataset Finetuning action dimension mismatch

Hi, thanks for the great work! I'm trying to fine-tune the Pi0 base model on the Droid dataset. I defined a TrainConfig as follows:

    TrainConfig(
        name="pi0_droid_finetune",
        model=pi0.Pi0Config(
            paligemma_variant="gemma_2b_lora", 
            action_expert_variant="gemma_300m_lora",
            action_dim = 8,            
            action_horizon=16,
            max_token_len=180),
        # The freeze filter defines which parameters should be frozen during training.
        # We have a convenience function in the model config that returns the default freeze filter
        # for the given model config for LoRA finetuning. Just make sure it matches the model config
        # you chose above.
        freeze_filter=pi0.Pi0Config(
            paligemma_variant="gemma_2b_lora", 
            action_expert_variant="gemma_300m_lora",
            action_dim = 8,            
            action_horizon=16,
            max_token_len=180,
        ).get_freeze_filter(),
        # Turn off EMA for LoRA finetuning.
        ema_decay=None,
        data=RLDSDroidDataConfig(
            repo_id="droid",
            # Set this to the path to your DROID RLDS dataset (the parent directory of the `droid` directory).
            rlds_data_dir="/root/autodl-tmp/",
            action_space=droid_rlds_dataset.DroidActionSpace.JOINT_POSITION,
        ),
        weight_loader=weight_loaders.CheckpointWeightLoader("/root/autodl-tmp/openpi/pi0_base/params"),
        num_train_steps=30_000,
        num_workers=0
    )

After computing the norm stats, I ran the following command:

uv run --group rlds scripts/train.py pi0_droid_finetune --exp-name=my_experiment --overwrite

I got a dimension mismatch error:

Traceback (most recent call last):
  File "/root/autodl-tmp/openpi/scripts/train.py", line 281, in <module>
    main(_config.cli())
  File "/root/autodl-tmp/openpi/scripts/train.py", line 237, in main
    train_state, train_state_sharding = init_train_state(config, init_rng, mesh, resume=resuming)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/openpi/.venv/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 559, in wrapped_fn
    return wrapped_fn_impl(args, kwargs, bound, memos)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/openpi/.venv/lib/python3.11/site-packages/jaxtyping/_decorator.py", line 483, in wrapped_fn_impl
    out = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/openpi/scripts/train.py", line 122, in init_train_state
    partial_params = _load_weights_and_validate(config.weight_loader, train_state_shape.params.to_pure_dict())
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/openpi/scripts/train.py", line 76, in _load_weights_and_validate
    at.check_pytree_equality(expected=params_shape, got=loaded_params, check_shapes=True, check_dtypes=True)
  File "/root/autodl-tmp/openpi/src/openpi/shared/array_typing.py", line 87, in check_pytree_equality
    jax.tree_util.tree_map_with_path(check, expected, got)
  File "/root/autodl-tmp/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 1183, in tree_map_with_path
    return treedef.unflatten(f(*xs) for xs in zip(*all_keypath_leaves))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/autodl-tmp/openpi/.venv/lib/python3.11/site-packages/jax/_src/tree_util.py", line 1183, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_keypath_leaves))
                             ^^^^^^
  File "/root/autodl-tmp/openpi/src/openpi/shared/array_typing.py", line 82, in check
    raise ValueError(f"Shape mismatch at {jax.tree_util.keystr(kp)}: expected {x.shape}, got {y.shape}")
ValueError: Shape mismatch at ['action_in_proj']['kernel']: expected (8, 1024), got (32, 1024)

The Droid dataset requires an action dimension of 8, but the checkpoint seems to use 32.

Jul 19 '25 02:07 AsDeadAsADodo

@AsDeadAsADodo The pi-zero models is trained with a maximum action dimension of 32, which exceeds the actual DoF of most single-arm robots and it actually contains a lot of paddings. They do this to make the model compatible to various hardware configurations with various DoFs, so this 32 far exceeds what is needed. For your case you need to keep this max dim option as 32 to match model's output and try to figure in other files to set your effective dimension of 8.

In my case I am using LeRobot's PyTorch translation version so I set self.config.action_feature.shape[0] as your dimension 8 and keep the self.config.max_action_dim=32. You may need to find how the original script configures this. Hope my experience helps.

Jul 19 '25 12:07 Tonghe-Zhang

@AsDeadAsADodo The pi-zero models is trained with a maximum action dimension of 32, which exceeds the actual DoF of most single-arm robots and it actually contains a lot of paddings. They do this to make the model compatible to various hardware configurations with various DoFs, so this 32 far exceeds what is needed. For your case you need to keep this max dim option as 32 to match model's output and try to figure in other files to set your effective dimension of 8.

In my case I am using LeRobot's PyTorch translation version so I set self.config.action_feature.shape[0] as your dimension 8 and keep the self.config.max_action_dim=32. You may need to find how the original script configures this. Hope my experience helps.

Thanks! That makes sense. I added some code to the DroidInputs class in DroidPolicy.py to achieve a similar effect. However, I’m still wondering, is this the right way to do it?

        if "actions" in data:
            # We are padding to the model action dim.
            # For pi0-FAST, this is a no-op (since action_dim = 7).
            actions = transforms.pad_to_dim(data["actions"], self.action_dim)
            inputs["actions"] = actions

Jul 19 '25 12:07 AsDeadAsADodo

@AsDeadAsADodo The pi-zero models is trained with a maximum action dimension of 32, which exceeds the actual DoF of most single-arm robots and it actually contains a lot of paddings. They do this to make the model compatible to various hardware configurations with various DoFs, so this 32 far exceeds what is needed. For your case you need to keep this max dim option as 32 to match model's output and try to figure in other files to set your effective dimension of 8.@AsDeadAsADodopi-zero 模型以 32 的最大动作维度进行训练，这超过了大多数单臂机器人的实际 DoF，并且它实际上包含大量填充。他们这样做是为了使模型与具有各种景深的各种硬件配置兼容，因此这 32 远远超出了需求。对于您的情况，您需要将此最大暗度选项保持为 32 以匹配模型的输出，并尝试在其他文件中计算以将有效尺寸设置为 8。 In my case I am using LeRobot's PyTorch translation version so I set self.config.action_feature.shape[0] as your dimension 8 and keep the self.config.max_action_dim=32. You may need to find how the original script configures this. Hope my experience helps.就我而言，我使用的是 LeRobot 的 PyTorch 翻译版本，因此我设置为 self.config.action_feature.shape[0] 您的维度 8 并保留 self.config.max_action_dim=32。您可能需要找到原始脚本如何配置此作。希望我的经验对您有所帮助。

Thanks! That makes sense. I added some code to the DroidInputs class in DroidPolicy.py to achieve a similar effect. However, I’m still wondering, is this the right way to do it?谢谢！这是有道理的。我在 DroidPolicy.py 中向 DroidInputs 类添加了一些代码以实现类似的效果。然而，我仍然想知道，这是正确的方法吗？
        if "actions" in data:
            # We are padding to the model action dim.
            # For pi0-FAST, this is a no-op (since action_dim = 7).
            actions = transforms.pad_to_dim(data["actions"], self.action_dim)
            inputs["actions"] = actions

Hello, I have also encountered the same problem. Do you have a better solution now

Oct 31 '25 10:10 sunmoon2018

@AsDeadAsADodo The pi-zero models is trained with a maximum action dimension of 32, which exceeds the actual DoF of most single-arm robots and it actually contains a lot of paddings. They do this to make the model compatible to various hardware configurations with various DoFs, so this 32 far exceeds what is needed. For your case you need to keep this max dim option as 32 to match model's output and try to figure in other files to set your effective dimension of 8.@AsDeadAsADodopi-zero 模型以 32 的最大动作维度进行训练，这超过了大多数单臂机器人的实际 DoF，并且它实际上包含大量填充。他们这样做是为了使模型与具有各种景深的各种硬件配置兼容，因此这 32 远远超出了需求。对于您的情况，您需要将此最大暗度选项保持为 32 以匹配模型的输出，并尝试在其他文件中计算以将有效尺寸设置为 8。 In my case I am using LeRobot's PyTorch translation version so I set self.config.action_feature.shape[0] as your dimension 8 and keep the self.config.max_action_dim=32. You may need to find how the original script configures this. Hope my experience helps.就我而言，我使用的是 LeRobot 的 PyTorch 翻译版本，因此我设置为 self.config.action_feature.shape[0] 您的维度 8 并保留 self.config.max_action_dim=32。您可能需要找到原始脚本如何配置此作。希望我的经验对您有所帮助。

Thanks! That makes sense. I added some code to the DroidInputs class in DroidPolicy.py to achieve a similar effect. However, I’m still wondering, is this the right way to do it?谢谢！这是有道理的。我在 DroidPolicy.py 中向 DroidInputs 类添加了一些代码以实现类似的效果。然而，我仍然想知道，这是正确的方法吗？
        if "actions" in data:
            # We are padding to the model action dim.
            # For pi0-FAST, this is a no-op (since action_dim = 7).
            actions = transforms.pad_to_dim(data["actions"], self.action_dim)
            inputs["actions"] = actions
Hello, I have also encountered the same problem. Do you have a better solution now

Sadly, no. My focus has shifted away from VLA.

Oct 31 '25 12:10 AsDeadAsADodo