Genesis icon indicating copy to clipboard operation
Genesis copied to clipboard

[Bug]: Discontinuous IK solutions.

Open HeegerGao opened this issue 5 months ago • 13 comments

Bug Description

Hi, I am using Genesis for mobile manipulation. I implemented a simple mobile manipulator (a base with a Franka arm). It works normally with joint space control.

However, when I want to use the end effector Cartesian space PD control (e.g., I want the end effector to move forward for 0.01m and keep the base unmoved), the IK seems to return a wrong qpos.

Specifically, in the following script, I have a mobile_franka robot, where the base has 3 dofs: moving left, right, and rotating, and the arm is a Franka Panda arm. I want to control the robot arm to move forward for 0.01m every step and keep the base unmoved. I set the arm_delta_action = torch.tensor([0.01, 0, 0, 0, 0, 0], device="cuda"), and solve the IK to get the target_qpos, where I have set the dofs_idx_local to only count for the robot arm dofs (that is [3,4,5,6,7,8,9]).

from copy import deepcopy
import sys

import torch
sys.path.append(".")
sys.path.append("..")
import numpy as np

import genesis as gs

from transform_utils import (
    euler_angles_to_matrix,
    matrix_to_quaternion,
    quaternion_multiply,
    quaternion_invert,
    Pose,
)

def get_pos(ee_link):
    return ee_link.get_pos()

def get_quat(ee_link):
    return ee_link.get_quat()

def get_pose(ee_link):
    return Pose.create_from_pq(ee_link.get_pos(), ee_link.get_quat())

def get_ee_link(entity, ee_link_name="hand"):
    return entity.get_link(ee_link_name)

gs.init(backend=gs.gpu)

scene = gs.Scene(
    viewer_options = gs.options.ViewerOptions(
        camera_pos    = (0, -3.5, 2.5),
        camera_lookat = (0.0, 0.0, 0.5),
        camera_fov    = 30,
        res           = (960, 640),
        max_FPS       = 60,
    ),
    sim_options = gs.options.SimOptions(
        dt = 0.01,
    ),
    show_viewer = True,
    show_FPS=True,
)

cam = scene.add_camera(
    res    = (960, 640),
    pos    = (0, -3.5, 2.5),
    lookat = (0.0, 0.0, 0.5),
    fov    = 30,
    GUI    = False
)

plane = scene.add_entity(
    gs.morphs.Plane(),
)
mobile_franka = scene.add_entity(
    gs.morphs.MJCF(
        file  = 'mobile/tidybot_panda.xml',
    ),
    material=gs.materials.Rigid(gravity_compensation=1.0),
)

scene.build()

scene.visualizer._context.add_light(
    {
        "type": "directional",  # point, directional, spot
        "dir": [0, 0.7071, -0.7071],  # Directional light pointing upwards
        "pos": [0, 0, 0],  # Position is not used for directional light
        "color": [1.0, 1.0, 1.0],
        "intensity": 6.0,
    }
)

jnt_names = [
    'joint_x',
    'joint_y',
    'joint_th',
    'joint1',
    'joint2',
    'joint3',
    'joint4',
    'joint5',
    'joint6',
    'joint7',
    'finger_joint1',
    'finger_joint2',
]
dofs_idx = [mobile_franka.get_joint(name).dof_idx_local for name in jnt_names]

mobile_franka.set_dofs_kp(
    kp             = np.array([600000, 600000, 30000, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]),
    dofs_idx_local = dofs_idx,
)

mobile_franka.set_dofs_kv(
    kv             = np.array([90000, 90000, 5000, 100,  100,  100,  100,  100,  100,  100, 100, 100]),
    dofs_idx_local = dofs_idx,
)

mobile_franka.set_dofs_force_range(
    lower          = np.array([-10000, -10000, -10000, -100, -100, -100, -100, -100, -100, -100, -100, -100]),
    upper          = np.array([10000, 10000, 10000, 100,  100,  100,  100,  100,  100,  100,  100,  100]),    
    dofs_idx_local = dofs_idx,
)

cam.start_recording()

for i in range(10):
    # hard set to a good init pose
    mobile_franka.set_dofs_position(
        np.array([0, 0, 0, 0.0, -np.pi / 4, 0, -3 * np.pi / 4, 0, np.pi / 2, np.pi / 4, 0.04, 0.04]),
        dofs_idx,
    )
    scene.step()

ee_pose = get_pose(get_ee_link(mobile_franka))
init_eepose = deepcopy(ee_pose)

for i in range(200):
    prev_ee_pose = ee_pose

    arm_delta_action = torch.tensor([0.01, 0, 0., 0, 0, 0], device="cuda")   # xyzrpy

    delta_pos, delta_rot = arm_delta_action[:3], arm_delta_action[3:6]
    delta_quat = matrix_to_quaternion(euler_angles_to_matrix(delta_rot, "XYZ"))
    p = ee_pose.p + delta_pos
    q = quaternion_multiply(delta_quat, ee_pose.q)

    target_qpos = mobile_franka.inverse_kinematics(
        link=get_ee_link(mobile_franka),
        pos=p[0],
        # quat=q[0],
        quat=init_eepose.q[0],
        init_qpos=mobile_franka.get_dofs_position(dofs_idx),
        max_solver_iters=100,
        dofs_idx_local=dofs_idx[3:10],
        # rot_mask=[False, False, False]
        # pos_mask=[False, False, False],
    )
    mobile_franka.control_dofs_position(
        target_qpos,
        dofs_idx,
    )

    ee_pose = get_pose(get_ee_link(mobile_franka))

    print(f"Delta End effector pose: {ee_pose.p - prev_ee_pose.p}", f"{ee_pose.q - prev_ee_pose.q}")

    scene.step()
    cam.render()
    
cam.stop_recording(save_to_filename='video.mp4', fps=60)

The result is very strange. The robot arm starts to twist:

https://github.com/user-attachments/assets/86efed0f-f94e-422e-914f-f6c34d4533c9

I have one more experiment to show that it is indeed the problem of the IK: If I set the action to all zero: arm_delta_action = torch.tensor([0, 0, 0, 0, 0, 0], device="cuda"), the target_qpos is correct, and the whole robot just keeps still.

Steps to Reproduce

Scripts, utils, and model files are attached. Just run test_genesis_mobile.py.

genesis_issue.zip

Expected Behavior

As stated above.

Screenshots/Videos

No response

Relevant log output


Environment

  • OS: Windows 11
  • GPU RTX 4090

Release version or Commit ID

commit ID: aa797a9, [BUG FIX] Fix Inverse Kinematics algorithm (cont'd).

Additional Context

No response

HeegerGao avatar Aug 20 '25 11:08 HeegerGao

I cannot run your script because transform_utils missing. Try relying on genesis.utils.geom instead.

Also, you should try replacing control_dofs_position by set_dofs_position. The result will not be physical anymore, but it will sort apart inverse kinematic issue from bad tracking. I suspect the issue you are observing is a combination of "discontinuous" solutions for IK (which is not a bug strictly speaking) and poor motor tracking due to PID gains that are too soft.

duburcqa avatar Aug 22 '25 13:08 duburcqa

Closing for now. I will re-open if you can provide additional information that suggests there is an actual bug.

duburcqa avatar Aug 22 '25 13:08 duburcqa

I cannot run your script because transform_utils missing. Try relying on genesis.utils.geom instead.

Also, you should try replacing control_dofs_position by set_dofs_position. The result will not be physical anymore, but it will sort apart inverse kinematic issue from bad tracking. I suspect the issue you are observing is a combination of "discontinuous" solutions for IK (which is not a bug strictly speaking) and poor motor tracking due to PID gains that are too soft.

transform_utils.py is in my attached zip file. I think it is straight forward to see it:

Image

HeegerGao avatar Aug 22 '25 14:08 HeegerGao

transform_utils.py is in my attached zip file. I think it is straight forward to see it:

OK, somehow the script did not find it when I first tried. May try again. My point is still valid though: you should not need any external transform utils. The one provided with Genesis are already highly optimised and thoroughly tested. Anyway, try following my advice regarding your simulation and let me know if it helps.

duburcqa avatar Aug 24 '25 08:08 duburcqa

Hi, I’m seeing the same behavior and I can reproduce the issue.

Changes (per your advice)

  • Switched to genesis.utils.geom (no external transform utils).
  • Replaced control_dofs_position(...) with set_dofs_position(...).

Result and problem

  • I've attached the video below. With set_dofs_position, tracking is kind of ideal.
  • However, IK still shows discontinuous solutions (branch flips) across tiny steps, even when warm-starting from previous qpos.

My fixed codes

from copy import deepcopy
import sys

import torch
sys.path.append(".")
sys.path.append("..")
import numpy as np

import genesis as gs
from genesis.utils import geom

def get_pos(ee_link):
    return ee_link.get_pos()

def get_quat(ee_link):
    return ee_link.get_quat()

def get_pose(ee_link):
    pos = ee_link.get_pos()
    quat = ee_link.get_quat()
    return pos, quat

def get_ee_link(entity, ee_link_name="hand"):
    return entity.get_link(ee_link_name)

gs.init(backend=gs.gpu)

scene = gs.Scene(
    viewer_options = gs.options.ViewerOptions(
        camera_pos    = (0, -3.5, 2.5),
        camera_lookat = (0.0, 0.0, 0.5),
        camera_fov    = 30,
        res           = (960, 640),
        max_FPS       = 60,
    ),
    sim_options = gs.options.SimOptions(
        dt = 0.01,
        gravity = (0, 0, -9.8),
    ),
    show_viewer = False,
    show_FPS=True,
)

cam = scene.add_camera(
    res    = (960, 640),
    pos    = (0, -3.5, 2.5),
    lookat = (0.0, 0.0, 0.5),
    fov    = 30,
    GUI    = False
)

plane = scene.add_entity(
    gs.morphs.Plane(),
)
mobile_franka = scene.add_entity(
    gs.morphs.MJCF(
        file  = 'mobile/tidybot_panda.xml',
    ),
    material=gs.materials.Rigid(gravity_compensation=1.0),
)

scene.build()

scene.visualizer._context.add_light(
    {
        "type": "directional",  # point, directional, spot
        "dir": [0, 0.7071, -0.7071],  # Directional light pointing upwards
        "pos": [0, 0, 0],  # Position is not used for directional light
        "color": [1.0, 1.0, 1.0],
        "intensity": 6.0,
    }
)

jnt_names = [
    'joint_x',
    'joint_y',
    'joint_th',
    'joint1',
    'joint2',
    'joint3',
    'joint4',
    'joint5',
    'joint6',
    'joint7',
    'finger_joint1',
    'finger_joint2',
]
dofs_idx = [mobile_franka.get_joint(name).dof_idx_local for name in jnt_names]

mobile_franka.set_dofs_kp(
    kp             = np.array([600000, 600000, 30000, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]),
    dofs_idx_local = dofs_idx,
)

mobile_franka.set_dofs_kv(
    kv             = np.array([90000, 90000, 5000, 100,  100,  100,  100,  100,  100,  100, 100, 100]),
    dofs_idx_local = dofs_idx,
)

mobile_franka.set_dofs_force_range(
    lower          = np.array([-10000, -10000, -10000, -100, -100, -100, -100, -100, -100, -100, -100, -100]),
    upper          = np.array([10000, 10000, 10000, 100,  100,  100,  100,  100,  100,  100,  100,  100]),    
    dofs_idx_local = dofs_idx,
)

cam.start_recording()

for i in range(10):
    # hard set to a good init pose
    mobile_franka.set_dofs_position(
        np.array([0, 0, 0, 0.0, -np.pi / 4, 0, -3 * np.pi / 4, 0, np.pi / 2, np.pi / 4, 0.04, 0.04]),
        dofs_idx,
    )
    scene.step()
    cam.render()

ee_pos, ee_quat = get_pose(get_ee_link(mobile_franka))
init_ee_pos = deepcopy(ee_pos)
init_ee_quat = deepcopy(ee_quat)

for i in range(200):
    prev_ee_pos = ee_pos
    prev_ee_quat = ee_quat

    arm_delta_action = torch.tensor([0.01, 0, 0., 0, 0, 0], device="cuda")   # xyzrpy

    delta_pos, delta_rot = arm_delta_action[:3], arm_delta_action[3:6]
    
    delta_quat = geom.xyz_to_quat(delta_rot)
    target_pos = ee_pos + delta_pos
    print(f"target_pos: {target_pos}")
    target_quat = geom.transform_quat_by_quat(delta_quat, ee_quat)
    print(f"target_quat: {target_quat}")

    target_qpos = mobile_franka.inverse_kinematics(
        link=get_ee_link(mobile_franka),
        pos=target_pos,
        quat=init_ee_quat, 
        init_qpos=mobile_franka.get_dofs_position(dofs_idx),
        max_solver_iters=100,
        dofs_idx_local=dofs_idx[3:10],
    )
    
    # use set_dofs_position instead of control_dofs_position
    mobile_franka.set_dofs_position(
        target_qpos,
        dofs_idx,
    )

    ee_pos, ee_quat = get_pose(get_ee_link(mobile_franka))

    print(f"Delta End effector pose: {ee_pos - prev_ee_pos}", f"{ee_quat - prev_ee_quat}")

    scene.step()
    cam.render()
    
cam.stop_recording(save_to_filename='video.mp4', fps=60)

The video is attached here:

https://github.com/user-attachments/assets/266138ee-9b02-4a6d-8635-7d1069df87dc

Steps to Reproduce

Scripts and model files are attached. Just run test_genesis_mobile_fixed.py.

genesis_issue.zip

HuangXuchuan avatar Aug 26 '25 07:08 HuangXuchuan

Perfect, thank you. We will investigate this issue when we have time.

duburcqa avatar Aug 28 '25 06:08 duburcqa

Hi, just checking in to see if there's any progress on this issue.

HeegerGao avatar Sep 08 '25 08:09 HeegerGao

It also seems that using the set_dofs_position() work pretty well in my code. When using the IK result with the pd controllers inside the control api, the manipulation is super unstable. And interestingly, different platform(cuda and mps) running the same code script in the same seed for numpy and genesis will yield different IK results at the beginning, and sometimes cuda work well and sometimes mps work well under the control_dofs_position() function. I may check if anything related with these manipulation is not functioning correctly in the future.

CybDing avatar Sep 15 '25 08:09 CybDing

When using the IK result with the pd controllers inside the control api, the manipulation is super unstable.

That is because it is not how it should be used. The solution of IK should NOT be used as input for control_dofs_position. Doing this on a real robot would probably completely destroy it if using high gains, which is usually the case unless coupled with RL policy. One should rather call plan_path instead to make sure the motion is smooth and continuous.

duburcqa avatar Sep 15 '25 08:09 duburcqa

When using the IK result with the pd controllers inside the control api, the manipulation is super unstable.

That is because it is not how it should be used. The solution of IK should NOT be used as input for control_dofs_position. Doing this on a real robot would probably completely destroy it if using high gains, which is usually the case unless coupled with RL policy. One should rather call plan_path instead to make sure the motion is smooth and continuous.

Thank you very much, I am going to test if planing path before executing the actions could work.

CybDing avatar Sep 15 '25 10:09 CybDing

Actually the issue is I use the same seed(including Genesis, and numpy), for executing the same code block to initialise the robot home_pos from the original setting defined in the Urdf file, and I find that the the IK results for the initial pose are always different between the one ran on my Mac, and the one running one the server using cuda.

So this directly cause the issue when I execute future actions(really small step) on the server starting from a not perfect configuration, the robot will first move into the configuration that seems to be the one on my Mac, and start rolling out. I don't know why the solver will give a different result in the same setting at the beginning, and try to move the arm back to another configuration.

Now, it's more stable when I use the same home_pos pre-calculated(finding that this is a good starting point), and the rolling out like moving into a new position will not need to flip the configuration on the server. This in some sense, could be an alternative solution so far for me.

CybDing avatar Sep 15 '25 15:09 CybDing

Hi, these days I have investigated about this problem and I've found an interesting sentence in your codes:

rigid_entity._func_get_jacobian(
                    tgt_link_idx=i_l_ee,
                    i_b=i_b,
                    p_local=ti.Vector.zero(gs.ti_float, 3),
                    pos_mask=pos_mask,
                    rot_mask=rot_mask,
                    dofs_info=dofs_info,
                    joints_info=joints_info,
                    links_info=links_info,
                    links_state=links_state,
                )  # NOTE: we still compute jacobian for all dofs as we haven't found a clean way to implement this

It's in rigid_solver_decomp.py, line 3562.

The model I have used is a combination of a base and an arm. By printing all of the result, I found the results of the model and the only arm model are different, which shows that although it seems I have denoted the dofs, the computation of Jacobin is still for all dofs. Furthermore, the results show that, while computing the Jacobian of other dofs, the results of the dofs I need is contaminated. Here is the result:

  Calculating translation for joint 5 i_d_jac: 4
  rotation: [-0.000000, 1.000000, 0.000000]
  tgt_link_pos: [0.306891, -0.000000, 0.925282]
  self._solver.links_state.pos[i_l, i_b]: [0.000000, 0.000000, 0.668000]
  tgt_link_pos - self._solver.links_state.pos[i_l, i_b]: [0.306891, -0.000000, 0.257282]
  Calculating translation for joint 4 i_d_jac: 3
  rotation: [0.000000, 0.000000, 1.000000]
  tgt_link_pos: [0.306891, -0.000000, 0.925282]
  self._solver.links_state.pos[i_l, i_b]: [0.000000, 0.000000, 0.668000]
  tgt_link_pos - self._solver.links_state.pos[i_l, i_b]: [0.306891, -0.000000, 0.257282]
  Calculating translation for joint 3 i_d_jac: 4
  rotation: [0.000000, 0.000000, 1.000000]
  tgt_link_pos: [0.306891, -0.000000, 0.925282]
  self._solver.links_state.pos[i_l, i_b]: [0.000000, 0.000000, 0.000000]
  tgt_link_pos - self._solver.links_state.pos[i_l, i_b]: [0.306891, -0.000000, 0.925282]

It's the result of the jacobian calculation in rigid_entity.py, _func_get_jacobian, line 997-1009

You can see that, the joint5's result is on i_d_jac=4, but the joint3's result is also on the same position, which covers the correct result.

I don't know my solution of this problem is "clean" or not. I just added a check before calculating the jacobian. If the dof does not need to calculate, it will jump to the next dof, and this already makes the result much understandable.

   elif joints_info.type[I_j] == gs.JOINT_TYPE.REVOLUTE:
        i_d = joints_info.dof_start[I_j]
        I_d = [i_d, i_b] if ti.static(self.solver._options.batch_dofs_info) else i_d
        i_d_jac = i_d + dof_offset - self._dof_start
        if self.dof_allowed[i_d]:
            rotation = gu.ti_transform_by_quat(dofs_info.motion_ang[I_d], links_state.quat[i_l, i_b])
            translation = rotation.cross(tgt_link_pos - links_state.pos[i_l, i_b])
        
            self._jacobian[0, i_d_jac, i_b] = translation[0] * pos_mask[0]
            self._jacobian[1, i_d_jac, i_b] = translation[1] * pos_mask[1]
            self._jacobian[2, i_d_jac, i_b] = translation[2] * pos_mask[2]
            self._jacobian[3, i_d_jac, i_b] = rotation[0] * rot_mask[0]
            self._jacobian[4, i_d_jac, i_b] = rotation[1] * rot_mask[1]
            self._jacobian[5, i_d_jac, i_b] = rotation[2] * rot_mask[2]

https://github.com/user-attachments/assets/01e0c892-54bf-4bd2-9339-92cec39099e7

HuangXuchuan avatar Nov 05 '25 15:11 HuangXuchuan

Any fix on this?

anashoussaini avatar Nov 12 '25 05:11 anashoussaini