robosuite icon indicating copy to clipboard operation
robosuite copied to clipboard

How to Convert Human Demonstrations into LeRobot Format for GR00T-N1

Open MaggieK410 opened this issue 8 months ago • 19 comments

Hello,Thank you for this very nice simple to use simulator!

I am currently trying to create my own dataset to test GR00T-N1 on and am using the GR1ArmsOnly for human data collection via the human demonstration script. This has worked fine so far, but I am now trying to make the outputs I get from the human demonstrations compatible with what GR00T-N1 needs as input (which is the LeRobot v2.0 dataformat). I am still struggling converting everything but have seen some other code for conversion from other simulators. Do you know if anything like this is available for robosuite or have some pointers on how to achieve this?

Thanks in advance!

MaggieK410 avatar Jun 05 '25 21:06 MaggieK410

Hi, unfortunately, we do not have specific scripts to convert our datasets for GR00T-N1 finetuning. However, I do have a script which converts robocasa datasets (very similar to robosuite datasets) to LeRobot format. Can be found here. Hope this helps!

Abhiram824 avatar Jun 05 '25 21:06 Abhiram824

Thanks for the quick response! I'll look into it and close this issue now

MaggieK410 avatar Jun 05 '25 21:06 MaggieK410

I have now tried to follow the code you sent and combine it with robsuite. However, GR00T-N1 expects GR1ArmsOnly to have 7 DOF per arm and 6 per hand. When I use the print_robot_action_info.py script, it outputs the right dimensions. However, my human collections have 6 DOF per arm and 6 per hand.

As I understand, the difference comes from WholeBody._action_split_indexes (which has the 26 dimensions) and WholeBody._whole_body_controller_action_split_indexes (which has the 24 dimensions). I believe this comes from the IKSolver, as the whole_body_controller_action_split_indexes are defined as: WholeBody._whole_body_controller_action_split_indexes.update(WholeBody.joint_action_policy.action_split_indexes()). and the WholeBody.joint_action_policy.action_split_indexes() is defind an IKSolver object.

This leads to the ik_utils.py file, in which it is defined that:

self.input_action_repr = input_action_repr
self.input_rotation_repr = input_rotation_repr
ROTATION_REPRESENTATION_DIMS: Dict[str, int] = {"quat_wxyz": 4, "axis_angle": 3}
self.rot_dim = ROTATION_REPRESENTATION_DIMS[input_rotation_repr]
self.pos_dim = 3
self.control_dim = len(self.site_names) * (self.pos_dim + self.rot_dim)

My question now is, why? Which of the 7 original joints for the arm: ['robot0_r_shoulder_pitch', 'robot0_r_shoulder_roll', 'robot0_r_shoulder_yaw', 'robot0_r_elbow_pitch', 'robot0_r_wrist_yaw', 'robot0_r_wrist_roll', 'robot0_r_wrist_pitch', 'robot0_l_shoulder_pitch', 'robot0_l_shoulder_roll', ' robot0_l_shoulder_yaw', 'robot0_l_elbow_pitch', 'robot0_l_wrist_yaw', 'robot0_l_wrist_roll', 'robot0_l_wrist_pitch']

Sorry if this is a very basic question, but without understand this, Im not sure how I can convert the outputs such that they could fit with GR00T.

MaggieK410 avatar Jun 06 '25 18:06 MaggieK410

I just saw that this post and it seems the 6 dimensions as the action space is in 6D poses and not in the joint space, is it because of that? If so, any way i can convert it back to joint angles?

MaggieK410 avatar Jun 06 '25 18:06 MaggieK410

Hi, correct, the 6D0F action space for the arm comes from the fact that the action space for arms are 6D poses when collecting human demonstrations. The joint state of the robot at each timestep should be stored in the observation data in your robosuite dataset. If not you can extract these observations from the raw dataset as done here

Abhiram824 avatar Jun 10 '25 03:06 Abhiram824

@Abhiram824 Hi, I have a following question. I'm trying to connect gr00t with the robocasa simulation environment. In the robocasa dataset they gave, the modality is set like this:

{
    "state": {
        "base_position": {
            "start": 0,
            "end": 3
        },
        "base_rotation": {
            "start": 3,
            "end": 7,
            "rotation_type": "quaternion"
        },
        "end_effector_position_absolute": {
            "start": 7,
            "end": 10
        },
        "end_effector_position_relative": {
            "start": 10,
            "end": 13
        },
        "end_effector_rotation_absolute": {
            "start": 13,
            "end": 17,
            "rotation_type": "quaternion"
        },
        "end_effector_rotation_relative": {
            "start": 17,
            "end": 21,
            "rotation_type": "quaternion"
        },
        "gripper_qpos": {
            "start": 21,
            "end": 23
        },
        "gripper_qvel": {
            "start": 23,
            "end": 25
        },
        "joint_position": {
            "start": 25,
            "end": 32
        },
        "joint_position_cos": {
            "start": 32,
            "end": 39
        },
        "joint_position_sin": {
            "start": 39,
            "end": 46
        },
        "joint_velocity": {
            "start": 46,
            "end": 53
        }
    },
    "action": {
        "base_motion": {
            "start": 0,
            "end": 4,
            "absolute": false
        },
        "control_mode": {
            "start": 4,
            "end": 5,
            "dtype": "int64",
            "range": [
                0,
                1
            ]
        },
        "end_effector_position": {
            "start": 5,
            "end": 8,
            "absolute": false
        },
        "end_effector_rotation": {
            "start": 8,
            "end": 11,
            "absolute": false,
            "rotation_type": "axis_angle"
        },
        "gripper_close": {
            "start": 11,
            "end": 12,
            "dtype": "int64",
            "range": [
                0,
                1
            ]
        }
    },

However, the robosuite outputs the observation of state having the following state dict:

dict_keys(['robot0_agentview_left_image', 'robot0_agentview_right_image', 'robot0_eye_in_hand_image', 'object', 'robot0_joint_pos', 'robot0_joint_pos_cos', 'robot0_joint_pos_sin', 'robot0_joint_vel', 'robot0_eef_pos', 'robot0_eef_quat', 'robot0_eef_quat_site', 'robot0_gripper_qpos', 'robot0_gripper_qvel', 'robot0_base_pos', 'robot0_base_quat', 'robot0_base_to_eef_pos', 'robot0_base_to_eef_quat', 'robot0_base_to_eef_quat_site'])

I know the first three are the relative cameras, but how should I make sure which state key in robosuite is pointing to the relative state key in gr00t's modality?

Jerryisqx avatar Jun 17 '25 12:06 Jerryisqx

@Abhiram824 Hi, I have a following question. I'm trying to connect gr00t with the robocasa simulation environment. In the robocasa dataset they gave, the modality is set like this:

{
    "state": {
        "base_position": {
            "start": 0,
            "end": 3
        },
        "base_rotation": {
            "start": 3,
            "end": 7,
            "rotation_type": "quaternion"
        },
        "end_effector_position_absolute": {
            "start": 7,
            "end": 10
        },
        "end_effector_position_relative": {
            "start": 10,
            "end": 13
        },
        "end_effector_rotation_absolute": {
            "start": 13,
            "end": 17,
            "rotation_type": "quaternion"
        },
        "end_effector_rotation_relative": {
            "start": 17,
            "end": 21,
            "rotation_type": "quaternion"
        },
        "gripper_qpos": {
            "start": 21,
            "end": 23
        },
        "gripper_qvel": {
            "start": 23,
            "end": 25
        },
        "joint_position": {
            "start": 25,
            "end": 32
        },
        "joint_position_cos": {
            "start": 32,
            "end": 39
        },
        "joint_position_sin": {
            "start": 39,
            "end": 46
        },
        "joint_velocity": {
            "start": 46,
            "end": 53
        }
    },
    "action": {
        "base_motion": {
            "start": 0,
            "end": 4,
            "absolute": false
        },
        "control_mode": {
            "start": 4,
            "end": 5,
            "dtype": "int64",
            "range": [
                0,
                1
            ]
        },
        "end_effector_position": {
            "start": 5,
            "end": 8,
            "absolute": false
        },
        "end_effector_rotation": {
            "start": 8,
            "end": 11,
            "absolute": false,
            "rotation_type": "axis_angle"
        },
        "gripper_close": {
            "start": 11,
            "end": 12,
            "dtype": "int64",
            "range": [
                0,
                1
            ]
        }
    },

However, the robosuite outputs the observation of state having the following state dict:

dict_keys(['robot0_agentview_left_image', 'robot0_agentview_right_image', 'robot0_eye_in_hand_image', 'object', 'robot0_joint_pos', 'robot0_joint_pos_cos', 'robot0_joint_pos_sin', 'robot0_joint_vel', 'robot0_eef_pos', 'robot0_eef_quat', 'robot0_eef_quat_site', 'robot0_gripper_qpos', 'robot0_gripper_qvel', 'robot0_base_pos', 'robot0_base_quat', 'robot0_base_to_eef_pos', 'robot0_base_to_eef_quat', 'robot0_base_to_eef_quat_site'])

I know the first three are the relative cameras, but how should I make sure which state key in robosuite is pointing to the relative state key in gr00t's modality?

Hi, I am not 100% sure since I did not work on the GR00T codebase, this might be a better question for the GR00T repo or for @kevin-thankyou-lin, who is more familiar with the GR00T code

Abhiram824 avatar Jun 18 '25 04:06 Abhiram824

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0

But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2.

Any help is appreciated!

branyang02 avatar Jun 30 '25 04:06 branyang02

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0

But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2.

Any help is appreciated!

Hi, the robomimic dataset have been converted to be compatible with robosuite v1.5 and can be found on huggingface

Abhiram824 avatar Jun 30 '25 05:06 Abhiram824

Another question regarding this. I figured out that GR00T outputs joint positions for the hand and arm (https://github.com/NVIDIA/Isaac-GR00T/issues/180). In short: 7 arm joints: Shoulder pitch Shoulder roll Shoulder yaw Elbow pitch Wrist yaw Wrist roll Wrist pitch

6 finger joints: Little finger Ring finger Middle finger Index finger Thumb rotation Thumb bending

Now there are two ways to integrate this with GR1ArmsOnly. Either I make a new BASIC controller or I use the default_gr1.json, which used the WholeBodyIK controller.

When using BASIC, the movements are jittery and noisy. I think it might be related to some hyperparameters, but I tried different settings and could not resolve the issue.

When using WholeBodyIK, the gripper (specific finger joint movements) seem to be ignored but the arm moves perfectly to the target. I am now trying to integrate the finger movements into WholeBodyIK somehow, but Im not sure how I can do this. Do you have any suggestions on how to integrate the gripper's fingers into the WholeBodyIK?

Here is the config I am using:

{
    "type": "WHOLE_BODY_IK",
    "composite_controller_specific_configs": {
        "ref_name": ["gripper0_right_grip_site", "gripper0_left_grip_site"],
        "interpolation": null,
        "actuation_part_names": ["torso", "head", "right", "left", "base", "legs"],
        "max_dq": 4,
        "nullspace_joint_weights": {
            "robot0_torso_waist_yaw": 100.0,
            "robot0_torso_waist_pitch": 100.0,
            "robot0_torso_waist_roll": 500.0,
            "robot0_l_shoulder_pitch": 4.0,
            "robot0_r_shoulder_pitch": 4.0,
            "robot0_l_shoulder_roll": 3.0,
            "robot0_r_shoulder_roll": 3.0,
            "robot0_l_shoulder_yaw": 2.0,
            "robot0_r_shoulder_yaw": 2.0
        },
        "ik_pseudo_inverse_damping": 5e-2,
        "ik_integration_dt": 1e-1,
        "ik_max_dq": 4.0,
        "ik_max_dq_torso": 0.2,
        "ik_input_type": "absolute",
        "ik_input_ref_frame": "base",
        "ik_input_rotation_repr": "axis_angle",
        "ik_verbose": true
    },
    "body_parts": {
        "arms": {
            "right": {
                "type" : "JOINT_POSITION",
                "input_max": 1,
                "input_min": -1,
                "input_type": "absolute",
                "output_max": 0.5,
                "output_min": -0.5,
                "kd": 200,
                "kv": 200,
                "kp": 1000,
                "velocity_limits": [-1,1],
                "kp_limits": [0, 1000],
                "interpolation": null,
                "ramp_ratio": 0.2,
                "gripper": {
                    "type" : "GRIP",
                    "input_max": 1,
                    "input_min": -1,
                    "input_type": "absolute",
                    "output_max": 0.5,
                    "output_min": -0.5,
                    "kd": 200,
                    "kv": 200,
                    "kp": 1000,
                    "velocity_limits": [-1,1],
                    "kp_limits": [0, 1000],
                    "interpolation": null,
                    "ramp_ratio": 0.2,
                    "use_action_scaling": false
                }
            },
            "left": {
                "type" : "JOINT_POSITION",
                "input_max": 1,
                "input_min": -1,
                "input_type": "absolute",
                "output_max": 0.5,
                "output_min": -0.5,
                "kd": 200,
                "kv": 200,
                "kp": 1000,
                "velocity_limits": [-1,1],
                "kp_limits": [0, 1000],
                "interpolation": null,
                "ramp_ratio": 0.2,
                "gripper": {
                    "type" : "GRIP",
                    "input_max": 1,
                    "input_min": -1,
                    "input_type": "absolute",
                    "output_max": 0.5,
                    "output_min": -0.5,
                    "kd": 200,
                    "kv": 200,
                    "kp": 1000,
                    "velocity_limits": [-1,1],
                    "kp_limits": [0, 1000],
                    "interpolation": null,
                    "ramp_ratio": 0.2,
                    "use_action_scaling": false
                }
            }
        },
        "torso": {
            "type" : "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "input_type": "absolute",
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        },
        "head": {
            "type" : "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "input_type": "absolute",
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        },
        "base": {
            "type": "JOINT_VELOCITY",
            "interpolation": "null"
        },
        "legs": {
            "type": "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        }
    }
}

Should I define the grippers as JOINT_POSITION instead of gripper? Do they have to be outside of the arms for that to work? Any insights are appreciated!

MaggieK410 avatar Jun 30 '25 09:06 MaggieK410

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0 But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2. Any help is appreciated!

Hi, the robomimic dataset have been converted to be compatible with robosuite v1.5 and can be found on huggingface

Thanks for the info! I would also like to know if you figured out whether robomimic data support camera observations as well. I tried to look into the huggingface robomimic data and can't seem to find camera observations?

branyang02 avatar Jul 01 '25 05:07 branyang02

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0 But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2. Any help is appreciated!

Hi, the robomimic dataset have been converted to be compatible with robosuite v1.5 and can be found on huggingface

Thanks for the info! I would also like to know if you figured out whether robomimic data support camera observations as well. I tried to look into the huggingface robomimic data and can't seem to find camera observations?

Hi all downloaded files which dont have low_dim in their name will contain camera observations. Example: https://huggingface.co/datasets/amandlek/robomimic/blob/main/v1.5/can/mg/demo_v15.hdf5

Abhiram824 avatar Jul 01 '25 18:07 Abhiram824

Another question regarding this. I figured out that GR00T outputs joint positions for the hand and arm (NVIDIA/Isaac-GR00T#180). In short: 7 arm joints: Shoulder pitch Shoulder roll Shoulder yaw Elbow pitch Wrist yaw Wrist roll Wrist pitch

6 finger joints: Little finger Ring finger Middle finger Index finger Thumb rotation Thumb bending

Now there are two ways to integrate this with GR1ArmsOnly. Either I make a new BASIC controller or I use the default_gr1.json, which used the WholeBodyIK controller.

When using BASIC, the movements are jittery and noisy. I think it might be related to some hyperparameters, but I tried different settings and could not resolve the issue.

When using WholeBodyIK, the gripper (specific finger joint movements) seem to be ignored but the arm moves perfectly to the target. I am now trying to integrate the finger movements into WholeBodyIK somehow, but Im not sure how I can do this. Do you have any suggestions on how to integrate the gripper's fingers into the WholeBodyIK?

Here is the config I am using:

{
    "type": "WHOLE_BODY_IK",
    "composite_controller_specific_configs": {
        "ref_name": ["gripper0_right_grip_site", "gripper0_left_grip_site"],
        "interpolation": null,
        "actuation_part_names": ["torso", "head", "right", "left", "base", "legs"],
        "max_dq": 4,
        "nullspace_joint_weights": {
            "robot0_torso_waist_yaw": 100.0,
            "robot0_torso_waist_pitch": 100.0,
            "robot0_torso_waist_roll": 500.0,
            "robot0_l_shoulder_pitch": 4.0,
            "robot0_r_shoulder_pitch": 4.0,
            "robot0_l_shoulder_roll": 3.0,
            "robot0_r_shoulder_roll": 3.0,
            "robot0_l_shoulder_yaw": 2.0,
            "robot0_r_shoulder_yaw": 2.0
        },
        "ik_pseudo_inverse_damping": 5e-2,
        "ik_integration_dt": 1e-1,
        "ik_max_dq": 4.0,
        "ik_max_dq_torso": 0.2,
        "ik_input_type": "absolute",
        "ik_input_ref_frame": "base",
        "ik_input_rotation_repr": "axis_angle",
        "ik_verbose": true
    },
    "body_parts": {
        "arms": {
            "right": {
                "type" : "JOINT_POSITION",
                "input_max": 1,
                "input_min": -1,
                "input_type": "absolute",
                "output_max": 0.5,
                "output_min": -0.5,
                "kd": 200,
                "kv": 200,
                "kp": 1000,
                "velocity_limits": [-1,1],
                "kp_limits": [0, 1000],
                "interpolation": null,
                "ramp_ratio": 0.2,
                "gripper": {
                    "type" : "GRIP",
                    "input_max": 1,
                    "input_min": -1,
                    "input_type": "absolute",
                    "output_max": 0.5,
                    "output_min": -0.5,
                    "kd": 200,
                    "kv": 200,
                    "kp": 1000,
                    "velocity_limits": [-1,1],
                    "kp_limits": [0, 1000],
                    "interpolation": null,
                    "ramp_ratio": 0.2,
                    "use_action_scaling": false
                }
            },
            "left": {
                "type" : "JOINT_POSITION",
                "input_max": 1,
                "input_min": -1,
                "input_type": "absolute",
                "output_max": 0.5,
                "output_min": -0.5,
                "kd": 200,
                "kv": 200,
                "kp": 1000,
                "velocity_limits": [-1,1],
                "kp_limits": [0, 1000],
                "interpolation": null,
                "ramp_ratio": 0.2,
                "gripper": {
                    "type" : "GRIP",
                    "input_max": 1,
                    "input_min": -1,
                    "input_type": "absolute",
                    "output_max": 0.5,
                    "output_min": -0.5,
                    "kd": 200,
                    "kv": 200,
                    "kp": 1000,
                    "velocity_limits": [-1,1],
                    "kp_limits": [0, 1000],
                    "interpolation": null,
                    "ramp_ratio": 0.2,
                    "use_action_scaling": false
                }
            }
        },
        "torso": {
            "type" : "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "input_type": "absolute",
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        },
        "head": {
            "type" : "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "input_type": "absolute",
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        },
        "base": {
            "type": "JOINT_VELOCITY",
            "interpolation": "null"
        },
        "legs": {
            "type": "JOINT_POSITION",
            "input_max": 1,
            "input_min": -1,
            "output_max": 0.5,
            "output_min": -0.5,
            "kd": 200,
            "kv": 200,
            "kp": 1000,
            "velocity_limits": [-1,1],
            "kp_limits": [0, 1000],
            "interpolation": null,
            "ramp_ratio": 0.2
        }
    }
}

Should I define the grippers as JOINT_POSITION instead of gripper? Do they have to be outside of the arms for that to work? Any insights are appreciated!

@kevin-thankyou-lin would you happen to know how to define a JOINT_POSITION controller for gripper and use WHOLE_BODY_IK for the rest of the GR1?

Abhiram824 avatar Jul 01 '25 18:07 Abhiram824

Something like:

                "gripper": {
                    "type" : "JOINT_POSITION",
                     ...
                }

This'll switch the controller here:

def gripper_controller_factory(name, params):
    interpolator = None
    if name == "GRIP":
        return gripper_controllers.SimpleGripController(interpolator=interpolator, **params)
    elif name == "JOINT_POSITION":
        return generic.JointPositionController(interpolator=interpolator, **params)
    raise ValueError("Unknown controller name: {}".format(name))

kevin-thankyou-lin avatar Jul 01 '25 20:07 kevin-thankyou-lin

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0 But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2. Any help is appreciated!

Hi, the robomimic dataset have been converted to be compatible with robosuite v1.5 and can be found on huggingface

Thanks for the info! I would also like to know if you figured out whether robomimic data support camera observations as well. I tried to look into the huggingface robomimic data and can't seem to find camera observations?

Hi all downloaded files which dont have low_dim in their name will contain camera observations. Example: https://huggingface.co/datasets/amandlek/robomimic/blob/main/v1.5/can/mg/demo_v15.hdf5

Correct me if Im wrong, but for files that are not low_dim, im getting these keys for each episode

<KeysViewHDF5 ['actions', 'interventions', 'policy_acting', 'states', 'user_acting']>

And actions and states are simply all concated states. I was hoping that I could extract individual states based on keys and camera views, for example if I only want the agent view or the gripper view, and I only want the relative position + quat + gripper as obs state.

branyang02 avatar Jul 02 '25 03:07 branyang02

JointPositionController

Thanks for this @kevin-thankyou-lin , this is what I needed. I have tried changing my config so that I have a Joint position gripper now. However, it this seems that there is an issue when I change this. Firstly, the joint indexes passed to the gripper are the joint names and not the indices. I can change this manually by adding:

if type(joint_indexes["joints"][0]) != int:
            self.joint_index=[self.sim.model.joint_name2id(joint_name) for joint_name in joint_indexes["joints"]]

However, then the action dimensions are not accurate anymore. When using Gripper instead of Joint positions, each hand has 6 action elements. However, there are 11 joints in a hand. This leads to the error:

assert len(action) == self.action_dim, "environment got invalid action dimension -- expected {}, got {}".format(
AssertionError: environment got invalid action dimension -- expected 26, got 36

Is there a setting I am missing here or is there a problem with joint positions for grippers ?

Here is my config for reference:

{
  "type": "BASIC",
  "composite_controller_specific_configs": {},
  "control_delta": false,
  "body_parts": {
    "right": {
      "type": "JOINT_POSITION",
      "input_max":  3.14159,
      "input_min": -3.14159,
      "output_max": 3.14159,
      "output_min": -3.14159,
      "kd": 200,
      "kv": 200,
      "kp": 10000,
      "interpolation": null,
      "input_type": "absolute",
      "ramp_ratio": 0.0,
      "gripper": {
                    "type" : "JOINT_POSITION",
                    "input_max": 3.14159,
                    "input_min": -3.14159,
                    "input_type": "absolute",
                    "output_max": 3.14159,
                    "output_min": -3.14159,
                    "kd": 200,
                    "kv": 200,
                    "kp": 10000,
                    "interpolation": null,
                    "ramp_ratio": 0.0,
                    "use_action_scaling": false
                }
    },
    "left": {
      "type": "JOINT_POSITION",
      "input_max": 3.14159,
      "input_min": -3.14159,
      "output_max": 3.14159,
      "output_min": -3.14159,
      "kd": 200,
      "kv": 200,
      "kp": 10000,
      "interpolation": null,
      "input_type": "absolute",
      "ramp_ratio": 0.0,
      "gripper": {
                    "type" : "JOINT_POSITION",
                    "input_max": 3.14159,
                    "input_min": -3.14159,
                    "input_type": "absolute",
                    "output_max": 3.14159,
                    "output_min": -3.14159,
                    "kd": 200,
                    "kv": 200,
                    "kp": 10000,
                    
                    
                    "interpolation": null,
                    "ramp_ratio": 0.0,
                    "use_action_scaling": false
                }
    }
  }
}

Any help would be deeply appreciated!

MaggieK410 avatar Jul 10 '25 15:07 MaggieK410

Hi do you have any datasets for robosuite v1.5? I converted some robomimc's dataset to lerobot format here: https://huggingface.co/datasets/brandonyang/stack_d0 But after training with pi0, it seems like it does not evaluate well with robosuitev1.5, possible due to a version mismatch as robomimic is generated with v1.2. Any help is appreciated!

Hi, the robomimic dataset have been converted to be compatible with robosuite v1.5 and can be found on huggingface

Thanks for the info! I would also like to know if you figured out whether robomimic data support camera observations as well. I tried to look into the huggingface robomimic data and can't seem to find camera observations?

Hi all downloaded files which dont have low_dim in their name will contain camera observations. Example: https://huggingface.co/datasets/amandlek/robomimic/blob/main/v1.5/can/mg/demo_v15.hdf5

Correct me if Im wrong, but for files that are not low_dim, im getting these keys for each episode

<KeysViewHDF5 ['actions', 'interventions', 'policy_acting', 'states', 'user_acting']>

And actions and states are simply all concated states. I was hoping that I could extract individual states based on keys and camera views, for example if I only want the agent view or the gripper view, and I only want the relative position + quat + gripper as obs state.

If you want to extract custom states, then this script in th robomimic repo might suit your usecase - you can specify camerviews, image sizes through the args, and it will extract both absolute and relative actions and relevant proprioceptive data.

Abhiram824 avatar Jul 10 '25 18:07 Abhiram824

I found

JointPositionController

Thanks for this @kevin-thankyou-lin , this is what I needed. I have tried changing my config so that I have a Joint position gripper now. However, it this seems that there is an issue when I change this. Firstly, the joint indexes passed to the gripper are the joint names and not the indices. I can change this manually by adding:

if type(joint_indexes["joints"][0]) != int:
            self.joint_index=[self.sim.model.joint_name2id(joint_name) for joint_name in joint_indexes["joints"]]

However, then the action dimensions are not accurate anymore. When using Gripper instead of Joint positions, each hand has 6 action elements. However, there are 11 joints in a hand. This leads to the error:

assert len(action) == self.action_dim, "environment got invalid action dimension -- expected {}, got {}".format(
AssertionError: environment got invalid action dimension -- expected 26, got 36

Is there a setting I am missing here or is there a problem with joint positions for grippers ?

Here is my config for reference:

{
  "type": "BASIC",
  "composite_controller_specific_configs": {},
  "control_delta": false,
  "body_parts": {
    "right": {
      "type": "JOINT_POSITION",
      "input_max":  3.14159,
      "input_min": -3.14159,
      "output_max": 3.14159,
      "output_min": -3.14159,
      "kd": 200,
      "kv": 200,
      "kp": 10000,
      "interpolation": null,
      "input_type": "absolute",
      "ramp_ratio": 0.0,
      "gripper": {
                    "type" : "JOINT_POSITION",
                    "input_max": 3.14159,
                    "input_min": -3.14159,
                    "input_type": "absolute",
                    "output_max": 3.14159,
                    "output_min": -3.14159,
                    "kd": 200,
                    "kv": 200,
                    "kp": 10000,
                    "interpolation": null,
                    "ramp_ratio": 0.0,
                    "use_action_scaling": false
                }
    },
    "left": {
      "type": "JOINT_POSITION",
      "input_max": 3.14159,
      "input_min": -3.14159,
      "output_max": 3.14159,
      "output_min": -3.14159,
      "kd": 200,
      "kv": 200,
      "kp": 10000,
      "interpolation": null,
      "input_type": "absolute",
      "ramp_ratio": 0.0,
      "gripper": {
                    "type" : "JOINT_POSITION",
                    "input_max": 3.14159,
                    "input_min": -3.14159,
                    "input_type": "absolute",
                    "output_max": 3.14159,
                    "output_min": -3.14159,
                    "kd": 200,
                    "kv": 200,
                    "kp": 10000,
                    
                    
                    "interpolation": null,
                    "ramp_ratio": 0.0,
                    "use_action_scaling": false
                }
    }
  }
}

Any help would be deeply appreciated!

I found that using Joint_Poisiton for the Fourier Hands does not yield good performance initially, because the actuators for the fingers are joint position actuators, but the joint position controller will pass torques (which works fine for motor actuators but not joint position actuators). I changed the joint position controller and passed the desired qpos directly if the part name was one of a gripper and now this setting works. One thing that still concerns me is that the parameters passed to the fingers (i.e. kd, kp, kv) are ignored and the default settings are used, which is confusing. For a quick fix I changed default settings, which would not really be desirable, I suppose. Not sure if this is intended like this?

MaggieK410 avatar Jul 11 '25 07:07 MaggieK410

Apologies for the delay, the controller configs used in GR00T-N1 experiments should be:

"gripper": { "type": "GRIP", "use_action_scaling": false }

kevin-thankyou-lin avatar Aug 01 '25 18:08 kevin-thankyou-lin