IsaacLab [Bug Report] During training the models disappear from the scene

Describe the bug

Custom robot model after some time of training it disappear without errors. What can I do to solve the problem?

[INFO][AppLauncher]: Loading experience file: /home/andrea/.local/share/ov/pkg/IsaacLab-2.0.2/apps/isaaclab.python.rendering.kit
Loading user config located at: '/home/andrea/.local/share/ov/pkg/isaac-sim-4.5.0/kit/data/Kit/Isaac-Sim/4.5/user.config.json'
[Info] [carb] Logging to file: /home/andrea/.local/share/ov/pkg/isaac-sim-4.5.0/kit/logs/Kit/Isaac-Sim/4.5/kit_20250410_163224.log
2025-04-10 14:32:24 [0ms] [Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn't available.
2025-04-10 14:32:24 [573ms] [Warning] [carb.windowing-glfw.gamepad] Joystick with unknown remapping detected (will be ignored):  FrSky FrSky Simulator [03000000830400002057000000010000]

|---------------------------------------------------------------------------------------------|
| Driver Version: 550.120       | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name                             | Active | LDA | GPU Memory | Vendor-ID | LUID       |
|     |                                  |        |     |            | Device-ID | UUID       |
|     |                                  |        |     |            | Bus-ID    |            |
|---------------------------------------------------------------------------------------------|
| 0   | NVIDIA GeForce RTX 4070          | Yes: 0 |     | 12282   MB | 10de      | 0          |
|     |                                  |        |     |            | 2786      | 5d9cd715.. |
|     |                                  |        |     |            | 2d        |            |
|=============================================================================================|
| OS: 22.04.5 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.5, Kernel: 6.8.0-57-generic
| XServer Vendor: The X.Org Foundation, XServer Version: 12101004 (1.21.1.4)
| Processor: AMD Ryzen 9 5950X 16-Core Processor
| Cores: 16 | Logical Cores: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 64216 | Free Memory: 55529
| Total Page/Swap (MB): 130757 | Free Page/Swap: 130757
|---------------------------------------------------------------------------------------------|
2025-04-10 14:32:25 [1,056ms] [Warning] [gpu.foundation.plugin] IOMMU is enabled.
2025-04-10 14:32:26 [2,048ms] [Warning] [omni.log] Source: omni.hydra was already registered.
2025-04-10 14:32:26 [2,197ms] [Warning] [omni.isaac.dynamic_control] omni.isaac.dynamic_control is deprecated as of Isaac Sim 4.5. No action is needed from end-users.
2025-04-10 14:32:28 [3,750ms] [Warning] [omni.replicator.core.scripts.extension] No material configuration file, adding configuration to material settings directly.
2025-04-10 14:32:28 [4,485ms] [Warning] [omni.kit.menu.utils.app_menu] add_menu_items: menu [<MenuItemDescription name:'New'>, <MenuItemDescription name:'Open'>, <MenuItemDescription name:'Re-open with New Edit Layer'>, <MenuItemDescription name:'Save'>, <MenuItemDescription name:'Save With Options'>, <MenuItemDescription name:'Save As...'>, <MenuItemDescription name:'Save Flattened As...'>, <MenuItemDescription name:'Add Reference'>, <MenuItemDescription name:'Add Payload'>, <MenuItemDescription name:'Exit'>] cannot change delegate
2025-04-10 14:32:30 [5,836ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/ambientOcclusion/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/directLighting/sampledLighting/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/indirectDiffuse/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/raytracing/cached/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/reflections/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/sceneDb/ambientLightIntensity'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/translucency/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/viewTile/limit'
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit buffer size 7374781440
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : valid false, within: false
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : decrement: 167690, decrement size: 7301033856
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : New limit 9748724 (slope: 439, intercept: 13179904)
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit buffer size 4287352704
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : valid true, within: true
2025-04-10 14:32:30 [6,083ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/directLighting/sampledLighting/samplesPerPixel'
2025-04-10 14:32:30 [6,083ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/pathtracing/maxSamplesPerLaunch'
2025-04-10 14:32:30 [6,084ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults-transient/meshlights/forceDisable'
2025-04-10 14:32:30 [6,131ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/post/dlss/execMode'
[INFO] Logging experiment in directory: /home/andrea/Documents/DeepLearning/logs/skrl/xRai
Exact experiment name requested from command line 2025-04-10_16-32-33_ppo_torch
Setting seed: 42
[INFO]: Base environment:
	Environment device    : cuda:0
	Environment seed      : 42
	Physics step-size     : 0.005
	Rendering step-size   : 0.02
	Environment step-size : 0.02
[INFO]: Time taken for scene creation : 1.045053 seconds
[INFO]: Scene manager:  <class InteractiveScene>
	Number of environments: 40
	Environment spacing   : 1.0
	Source prim name      : /World/envs/env_0
	Global prim paths     : ['/World/ground']
	Replicate physics     : True
[INFO]: Starting the simulation. This may take a few seconds. Please wait...
2025-04-10 14:32:38 [14,146ms] [Warning] [omni.hydra.scene_delegate.plugin] Calling getBypassRenderSkelMeshProcessing for prim /World/envs/env_0/Robot/Biped_robot_new/hip_base_right/visuals.proto_mesh_id3 that has not been populated
[INFO]: Time taken for simulation start : 4.795440 seconds
[INFO] Event Manager:  <EventManager> contains 3 active terms.
+---------------------------------------+
|  Active Event Terms in Mode: 'reset'  |
+--------+------------------------------+
| Index  | Name                         |
+--------+------------------------------+
|   0    | robot_physics_material       |
|   1    | base_external_force_torque   |
|   2    | reset_base                   |
+--------+------------------------------+
+---------------------------------------+
| Active Event Terms in Mode: 'startup' |
+---------+-----------------------------+
|  Index  | Name                        |
+---------+-----------------------------+
|    0    | scale_all_link_masses       |
|    1    | add_base_mass               |
+---------+-----------------------------+
+----------------------------------------------+
|    Active Event Terms in Mode: 'interval'    |
+-------+------------+-------------------------+
| Index | Name       | Interval time range (s) |
+-------+------------+-------------------------+
|   0   | push_robot |       (10.0, 15.0)      |
+-------+------------+-------------------------+

Creating window for environment.
ManagerLiveVisualizer cannot be created for manager: action_manager, Manager does not exist
ManagerLiveVisualizer cannot be created for manager: observation_manager, Manager does not exist
[INFO]: Completed setting up the environment...
[skrl:INFO] Environment wrapper: Isaac Lab (single-agent)
[skrl:INFO] Seed: 42
==================================================
Model (role): policy
==================================================

class GaussianModel(GaussianMixin, Model):
    def __init__(self, observation_space, action_space, device, clip_actions,
                    clip_log_std, min_log_std, max_log_std, reduction="sum"):
        Model.__init__(self, observation_space, action_space, device)
        GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std, reduction)

        self.net_container = nn.Sequential(
            nn.LazyLinear(out_features=512),
            nn.ELU(),
            nn.LazyLinear(out_features=256),
            nn.ELU(),
            nn.LazyLinear(out_features=128),
            nn.ELU(),
            nn.LazyLinear(out_features=self.num_actions),
        )
        self.log_std_parameter = nn.Parameter(torch.full(size=(self.num_actions,), fill_value=0.0), requires_grad=True)

    def compute(self, inputs, role=""):
        states = unflatten_tensorized_space(self.observation_space, inputs.get("states"))
        taken_actions = unflatten_tensorized_space(self.action_space, inputs.get("taken_actions"))
        output = self.net_container(states)
        return output, self.log_std_parameter, {}
    
--------------------------------------------------
==================================================
Model (role): value
==================================================

class DeterministicModel(DeterministicMixin, Model):
    def __init__(self, observation_space, action_space, device, clip_actions):
        Model.__init__(self, observation_space, action_space, device)
        DeterministicMixin.__init__(self, clip_actions)

        self.net_container = nn.Sequential(
            nn.LazyLinear(out_features=512),
            nn.ELU(),
            nn.LazyLinear(out_features=256),
            nn.ELU(),
            nn.LazyLinear(out_features=128),
            nn.ELU(),
            nn.LazyLinear(out_features=1),
        )

    def compute(self, inputs, role=""):
        states = unflatten_tensorized_space(self.observation_space, inputs.get("states"))
        taken_actions = unflatten_tensorized_space(self.action_space, inputs.get("taken_actions"))
        output = self.net_container(states)
        return output, {}
    
--------------------------------------------------
  0%|                                                                                                | 4796/4800000 [08:08<165:27:17,  8.05it/s]

Here the link of video about the issue.

Steps to reproduce

System Info

Describe the characteristic of your environment:

Commit: [e.g. 8f3b9ca]
Isaac Sim Version: 4.5
GPU: RTX-4070
CUDA: 12.4
GPU Driver: 550.120

Additional context

Add any other context about the problem here.

Checklist

[X] I have checked that there is no similar issue in the repo (required)
[ ] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

[ ] Criteria 1
[ ] Criteria 2

Apr 10 '25 19:04 AndreaRossetto

Update: For some reason the values went to NaN as shown in Tensorboard:

Apr 11 '25 07:04 AndreaRossetto

Thank you for posting this. We can't access the video in the link you posted. Could you post here a portion of it that can be uploaded with the issue? Thanks.

Apr 11 '25 10:04 RandomOakForest

https://github.com/user-attachments/assets/2067cc65-f720-40c6-bd66-7aa96f4e2be2

Apr 11 '25 10:04 AndreaRossetto

I have done a step forward with the issue. At a certain time during the training the PPO action values are Nan, Why?

Apr 17 '25 10:04 AndreaRossetto

I have encountered something similar before as well. From my experience with those issues, object or robots disappear usually caused by physics instability, the solution diverged and value exploded. You can try smaller dt, higher solver iterations. For me, one of those usually solve the issue.

Apr 21 '25 04:04 zoctipus

Thank you @zoctipus for your suggestion, my dt is 0.005 ( I can try a smaller one). If I decrese the learning rate of the ppo the problems is much less frequent but still present (i don't if if this could be a correct way to do it).

The first thing is going NaN are the actions value from the ppo, is correct if I clip those values before applying them to the model?

Apr 21 '25 08:04 AndreaRossetto

Good job in found out that learning rate seems to be somewhat helpful! Physic dt of 0.005 is usually an ok value, though I do see some of my environment will results in a very poor policy with 0.005 but with much better policy when dt changed to 0.002 or 0.001 .

That said, Nan can either come from environment, or Nan can come from your policy network, you need to know where is the nan come from.

If your nan come from network, that usually indicates if observation is not normalized, reward are too extreme, learning rate too high, exploration too aggressive etc, all the values that matters to learning are unstable and caused the network weight to explode. Smaller learning rate can be helpful, but usually those default one can be pretty good. If you found need much smaller learning rate than default to work, it may suggest other problems.

If your nan come from environment, (like the robot disappear), that mean, physx solutions diverge. It can be that robot action is too aggressive(so the solution diverge), it can be that update rate is too large(so the solution diverge), and that can result states reading to be Nan, and resulted in part of observations and rewards that depends on those states becoming Nan...

These issue are hard debug in nature, and forming an educated guess is crucial in diagnosing problem.

you describe that smaller learning rate can results in less Nan issue makes me wonder, do you have reward that is very large? is your observation normalized? if action rate is too negative due to action explode, maybe try clipping the action rate weight range so that the crazy reward never gets to ppo?

Apr 21 '25 08:04 zoctipus

@zoctipus It seems my network is generating the nan, the rewards seems to be not so high (I took the values from the cassie task):

` lin_vel_z_weight = -2.0 ang_vel_xy_weight = -0.05 action_rate_weight = -0.015 undesired_contact_weight = -1.0

# REWARDS
lin_vel_xy_weight           = 2.0
ang_vel_z_weight            = 1.0
feer_air_weight             = 0.125`

The ppo configuration seems not too extereme

  rollouts:                           16       
  learning_epochs:               5
  mini_batches:                    4   
  discount_factor:                0.99     # gamma
  lambda:                             0.95
  learning_rate:                    5.0e-04 # <- get nan also with this value, (changed from 1.0e-03)

Another problem is the model seems not learning, so maybe there is something in here that I'm not able to see (the ppo keep doing the same actions and is not able to improve).

https://github.com/user-attachments/assets/274c14a2-a95e-4a1a-8be4-e0ee3523715a

Apr 21 '25 12:04 AndreaRossetto

Hi @AndreaRossetto

You can try clipping the actions explicitly or narrowing the output range (e.g.: using tanh), as discussed in https://github.com/isaac-sim/IsaacLab/pull/984#issuecomment-2391488483

Apr 21 '25 14:04 Toni-SM

Hi @Toni-SM I tried to clip the PPo with the parameter "clip_actions" but seems not working (I keep getting Nan values)

models:
  separate: False
  policy:  # see gaussian_model parameters
    class: GaussianMixin
    clip_actions: True
    clip_log_std: True

is correct if I clip in this way?

    def _pre_physics_step(self, actions: torch.Tensor) -> None:
        self._actions = actions.clone().clamp(-10.0, 10.0)
        self._processed_actions = self.cfg.action_scale * self._actions + self._robot.data.default_joint_pos

Apr 25 '25 08:04 AndreaRossetto

Did you also reset your robot velocity? Make sure your reset function is proper. At least in my case a proper reset function is solution. Bad collision also cause robots to disappear.

May 03 '25 10:05 celestialdr4g0n

@celestialdr4g0n my reset function is this, I reset the pose and also the velocities

    def _reset_idx(self, env_ids: torch.Tensor | None):
        #print("RESET")
        if env_ids is None or len(env_ids) == self.num_envs:
            env_ids = self._robot._ALL_INDICES
        self._robot.reset(env_ids)

        super()._reset_idx(env_ids)

        if len(env_ids) == self.num_envs:
            # Spread out the resets to avoid spikes in training when many environments reset at a similar time
            self.episode_length_buf[:] = torch.randint_like(self.episode_length_buf, high=int(self.max_episode_length))

        self._actions[env_ids] = 0.0
        self._previous_actions[env_ids] = 0.0

        self.cmdRange = modify_command_velocity(env             = self,
                                                term_name       = 'lin_vel_xy_exp',
                                                rew             = self.tot_reward,
                                                weight          = self.cfg.lin_vel_xy_weight,
                                                max_velocity    = [-1.5, 3.0],
                                                interval        = 200 * 24,
                                                starting_step   = 20000 * 24
                                                )


        # Sample new commands, linear velocity x, y and heading
        self._commands[env_ids, :3] = torch.zeros_like(self._commands[env_ids, :3]).uniform_(self.cmdRange[0], self.cmdRange[1])
        # Sample new commands, body angles (pitch, roll and yaw)
        self._commands[env_ids, 3:] = torch.zeros_like(self._commands[env_ids, :3]).uniform_(self.angleRange[0], self.angleRange[1])
        self.body[env_ids] = self._robot._data.body_pos_w[env_ids, self._base_id].squeeze(1)

        # Reset feet references
        #self.feetRef[env_ids] = torch.zeros_like(self.feetRef[env_ids])

        # Reset robot state
        joint_pos = self._robot.data.default_joint_pos[env_ids]
        joint_vel = self._robot.data.default_joint_vel[env_ids]
        default_root_state = self._robot.data.default_root_state[env_ids]
        default_root_state[:, :3] += self._terrain.env_origins[env_ids]

        self._robot.write_root_pose_to_sim(default_root_state[:, :7], env_ids)
        self._robot.write_root_velocity_to_sim(default_root_state[:, 7:], env_ids)
        self._robot.write_joint_state_to_sim(joint_pos, joint_vel, None, env_ids)

        self._robot._external_force_b[env_ids] = 0.0
        self._robot._external_torque_b[env_ids] = 0.0

        modify_push_force(env           = self,
                          env_ids       = self.num_envs,
                          term_name     = "push_robot",
                          max_velocity  = [-3.0, 3.0],
                          interval      = 400 * 24,
                          starting_step = 10000 * 24)

        # Logging
        extras = dict()
        for key in self._episode_sums.keys():
            episodic_sum_avg = torch.mean(self._episode_sums[key][env_ids])
            extras["Episode_Reward/" + key] = episodic_sum_avg / self.max_episode_length_s
            self._episode_sums[key][env_ids] = 0.0

        self.extras["log"] = dict()
        self.extras["log"].update(extras)

        extras = dict()
        extras["Episode_Termination/base_contact"] = torch.count_nonzero(self.reset_terminated[env_ids]).item()
        extras["Episode_Termination/time_out"] = torch.count_nonzero(self.reset_time_outs[env_ids]).item()

        self.extras["log"].update(extras)

I've noticed one strange thing during the training: this at the beginning, seems ok (maybe), the PPO tries to do something

https://github.com/user-attachments/assets/a9b7484b-4ef3-4180-a6cf-306f5adee4e1

After a while of training it does this:

https://github.com/user-attachments/assets/a265f7d0-b507-4f90-802b-f06a6b85a29a

I don't know if this is a my impressions but seems that the gravity is not correct, am I wrong?

May 05 '25 06:05 AndreaRossetto

Yes, there it is. I mean zero the velocity, I do not know why using the default velocity does not work. Bellow is my code root_state = self._robot.data.default_root_state.clone()[env_ids] root_state[:, 0:3] += self.scene.env_origins[env_ids] root_vel_zeros = torch.zeros_like(root_state[:, 7:]) self._robot.write_root_pose_to_sim(root_state[:, 0:7], env_ids=env_ids) self._robot.write_root_velocity_to_sim(root_vel_zeros, env_ids=env_ids)

May 05 '25 06:05 celestialdr4g0n

Thank you @celestialdr4g0n, I changed the reset part like you mention and seems better but I still have that strange movement

https://github.com/user-attachments/assets/d376a054-5354-400b-ab72-943435471603

May 05 '25 08:05 AndreaRossetto

I have no idea why it is in this way. But you are working with your own robot so make your functions do their jobs .

Can you really control each joint of your robot as you want?(make sure the output of the policy "go into" your robot properly)

In addition 40 envs is a small number. If you trained with larger num of envs then just ignore this suggestion.

May 07 '25 10:05 celestialdr4g0n

I'm trying with a large number of envs (100/200), I see the ppo is moving the all the joints but sometimes seems it moves the joints one time and then it waits the robot death. The next step maybe repeats the same movements

May 07 '25 11:05 AndreaRossetto

I've done some stpes forwared, so now at least seems starting working... I think still some problems (The fps are really low, with 400 envs)

https://github.com/user-attachments/assets/ca592a5d-ecf4-4ea8-9760-f821de84de0b

May 09 '25 15:05 AndreaRossetto

Hi @AndreaRossetto , I'm facing the exact same issue but with DDPG from SKRL (for training a quadruped). Following this issue i tried to fix my code but without any success. Can we have a quick chat/call about it? (in/pietro-dardano) Thanks :)

May 14 '25 09:05 pietrodardano

@pietrodardano yes we can have a chat or a call what you prefer

May 14 '25 13:05 AndreaRossetto

Thank you for following up on this issue. I will move the post to our Discussions for now.

May 22 '25 19:05 RandomOakForest