[Bug Report] During training the models disappear from the scene
Describe the bug
Custom robot model after some time of training it disappear without errors. What can I do to solve the problem?
[INFO][AppLauncher]: Loading experience file: /home/andrea/.local/share/ov/pkg/IsaacLab-2.0.2/apps/isaaclab.python.rendering.kit
Loading user config located at: '/home/andrea/.local/share/ov/pkg/isaac-sim-4.5.0/kit/data/Kit/Isaac-Sim/4.5/user.config.json'
[Info] [carb] Logging to file: /home/andrea/.local/share/ov/pkg/isaac-sim-4.5.0/kit/logs/Kit/Isaac-Sim/4.5/kit_20250410_163224.log
2025-04-10 14:32:24 [0ms] [Warning] [omni.kit.app.plugin] No crash reporter present, dumps uploading isn't available.
2025-04-10 14:32:24 [573ms] [Warning] [carb.windowing-glfw.gamepad] Joystick with unknown remapping detected (will be ignored): FrSky FrSky Simulator [03000000830400002057000000010000]
|---------------------------------------------------------------------------------------------|
| Driver Version: 550.120 | Graphics API: Vulkan
|=============================================================================================|
| GPU | Name | Active | LDA | GPU Memory | Vendor-ID | LUID |
| | | | | | Device-ID | UUID |
| | | | | | Bus-ID | |
|---------------------------------------------------------------------------------------------|
| 0 | NVIDIA GeForce RTX 4070 | Yes: 0 | | 12282 MB | 10de | 0 |
| | | | | | 2786 | 5d9cd715.. |
| | | | | | 2d | |
|=============================================================================================|
| OS: 22.04.5 LTS (Jammy Jellyfish) ubuntu, Version: 22.04.5, Kernel: 6.8.0-57-generic
| XServer Vendor: The X.Org Foundation, XServer Version: 12101004 (1.21.1.4)
| Processor: AMD Ryzen 9 5950X 16-Core Processor
| Cores: 16 | Logical Cores: 32
|---------------------------------------------------------------------------------------------|
| Total Memory (MB): 64216 | Free Memory: 55529
| Total Page/Swap (MB): 130757 | Free Page/Swap: 130757
|---------------------------------------------------------------------------------------------|
2025-04-10 14:32:25 [1,056ms] [Warning] [gpu.foundation.plugin] IOMMU is enabled.
2025-04-10 14:32:26 [2,048ms] [Warning] [omni.log] Source: omni.hydra was already registered.
2025-04-10 14:32:26 [2,197ms] [Warning] [omni.isaac.dynamic_control] omni.isaac.dynamic_control is deprecated as of Isaac Sim 4.5. No action is needed from end-users.
2025-04-10 14:32:28 [3,750ms] [Warning] [omni.replicator.core.scripts.extension] No material configuration file, adding configuration to material settings directly.
2025-04-10 14:32:28 [4,485ms] [Warning] [omni.kit.menu.utils.app_menu] add_menu_items: menu [<MenuItemDescription name:'New'>, <MenuItemDescription name:'Open'>, <MenuItemDescription name:'Re-open with New Edit Layer'>, <MenuItemDescription name:'Save'>, <MenuItemDescription name:'Save With Options'>, <MenuItemDescription name:'Save As...'>, <MenuItemDescription name:'Save Flattened As...'>, <MenuItemDescription name:'Add Reference'>, <MenuItemDescription name:'Add Payload'>, <MenuItemDescription name:'Exit'>] cannot change delegate
2025-04-10 14:32:30 [5,836ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/ambientOcclusion/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/directLighting/sampledLighting/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/indirectDiffuse/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/raytracing/cached/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/reflections/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/sceneDb/ambientLightIntensity'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/translucency/enabled'
2025-04-10 14:32:30 [5,837ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/viewTile/limit'
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit buffer size 7374781440
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : valid false, within: false
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : decrement: 167690, decrement size: 7301033856
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : New limit 9748724 (slope: 439, intercept: 13179904)
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit buffer size 4287352704
2025-04-10 14:32:30 [5,972ms] [Warning] [rtx.scenedb.plugin] SceneDbContext : TLAS limit : valid true, within: true
2025-04-10 14:32:30 [6,083ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/directLighting/sampledLighting/samplesPerPixel'
2025-04-10 14:32:30 [6,083ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/pathtracing/maxSamplesPerLaunch'
2025-04-10 14:32:30 [6,084ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults-transient/meshlights/forceDisable'
2025-04-10 14:32:30 [6,131ms] [Warning] [omni.usd-abi.plugin] No setting was found for '/rtx-defaults/post/dlss/execMode'
[INFO] Logging experiment in directory: /home/andrea/Documents/DeepLearning/logs/skrl/xRai
Exact experiment name requested from command line 2025-04-10_16-32-33_ppo_torch
Setting seed: 42
[INFO]: Base environment:
Environment device : cuda:0
Environment seed : 42
Physics step-size : 0.005
Rendering step-size : 0.02
Environment step-size : 0.02
[INFO]: Time taken for scene creation : 1.045053 seconds
[INFO]: Scene manager: <class InteractiveScene>
Number of environments: 40
Environment spacing : 1.0
Source prim name : /World/envs/env_0
Global prim paths : ['/World/ground']
Replicate physics : True
[INFO]: Starting the simulation. This may take a few seconds. Please wait...
2025-04-10 14:32:38 [14,146ms] [Warning] [omni.hydra.scene_delegate.plugin] Calling getBypassRenderSkelMeshProcessing for prim /World/envs/env_0/Robot/Biped_robot_new/hip_base_right/visuals.proto_mesh_id3 that has not been populated
[INFO]: Time taken for simulation start : 4.795440 seconds
[INFO] Event Manager: <EventManager> contains 3 active terms.
+---------------------------------------+
| Active Event Terms in Mode: 'reset' |
+--------+------------------------------+
| Index | Name |
+--------+------------------------------+
| 0 | robot_physics_material |
| 1 | base_external_force_torque |
| 2 | reset_base |
+--------+------------------------------+
+---------------------------------------+
| Active Event Terms in Mode: 'startup' |
+---------+-----------------------------+
| Index | Name |
+---------+-----------------------------+
| 0 | scale_all_link_masses |
| 1 | add_base_mass |
+---------+-----------------------------+
+----------------------------------------------+
| Active Event Terms in Mode: 'interval' |
+-------+------------+-------------------------+
| Index | Name | Interval time range (s) |
+-------+------------+-------------------------+
| 0 | push_robot | (10.0, 15.0) |
+-------+------------+-------------------------+
Creating window for environment.
ManagerLiveVisualizer cannot be created for manager: action_manager, Manager does not exist
ManagerLiveVisualizer cannot be created for manager: observation_manager, Manager does not exist
[INFO]: Completed setting up the environment...
[skrl:INFO] Environment wrapper: Isaac Lab (single-agent)
[skrl:INFO] Seed: 42
==================================================
Model (role): policy
==================================================
class GaussianModel(GaussianMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions,
clip_log_std, min_log_std, max_log_std, reduction="sum"):
Model.__init__(self, observation_space, action_space, device)
GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std, reduction)
self.net_container = nn.Sequential(
nn.LazyLinear(out_features=512),
nn.ELU(),
nn.LazyLinear(out_features=256),
nn.ELU(),
nn.LazyLinear(out_features=128),
nn.ELU(),
nn.LazyLinear(out_features=self.num_actions),
)
self.log_std_parameter = nn.Parameter(torch.full(size=(self.num_actions,), fill_value=0.0), requires_grad=True)
def compute(self, inputs, role=""):
states = unflatten_tensorized_space(self.observation_space, inputs.get("states"))
taken_actions = unflatten_tensorized_space(self.action_space, inputs.get("taken_actions"))
output = self.net_container(states)
return output, self.log_std_parameter, {}
--------------------------------------------------
==================================================
Model (role): value
==================================================
class DeterministicModel(DeterministicMixin, Model):
def __init__(self, observation_space, action_space, device, clip_actions):
Model.__init__(self, observation_space, action_space, device)
DeterministicMixin.__init__(self, clip_actions)
self.net_container = nn.Sequential(
nn.LazyLinear(out_features=512),
nn.ELU(),
nn.LazyLinear(out_features=256),
nn.ELU(),
nn.LazyLinear(out_features=128),
nn.ELU(),
nn.LazyLinear(out_features=1),
)
def compute(self, inputs, role=""):
states = unflatten_tensorized_space(self.observation_space, inputs.get("states"))
taken_actions = unflatten_tensorized_space(self.action_space, inputs.get("taken_actions"))
output = self.net_container(states)
return output, {}
--------------------------------------------------
0%| | 4796/4800000 [08:08<165:27:17, 8.05it/s]
Here the link of video about the issue.
Steps to reproduce
System Info
Describe the characteristic of your environment:
- Commit: [e.g. 8f3b9ca]
- Isaac Sim Version: 4.5
- GPU: RTX-4070
- CUDA: 12.4
- GPU Driver: 550.120
Additional context
Add any other context about the problem here.
Checklist
- [X] I have checked that there is no similar issue in the repo (required)
- [ ] I have checked that the issue is not in running Isaac Sim itself and is related to the repo
Acceptance Criteria
Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.
- [ ] Criteria 1
- [ ] Criteria 2
Update:
For some reason the values went to NaN as shown in Tensorboard:
Thank you for posting this. We can't access the video in the link you posted. Could you post here a portion of it that can be uploaded with the issue? Thanks.
https://github.com/user-attachments/assets/2067cc65-f720-40c6-bd66-7aa96f4e2be2
I have done a step forward with the issue. At a certain time during the training the PPO action values are Nan, Why?
I have encountered something similar before as well. From my experience with those issues, object or robots disappear usually caused by physics instability, the solution diverged and value exploded. You can try smaller dt, higher solver iterations. For me, one of those usually solve the issue.
Thank you @zoctipus for your suggestion, my dt is 0.005 ( I can try a smaller one). If I decrese the learning rate of the ppo the problems is much less frequent but still present (i don't if if this could be a correct way to do it).
The first thing is going NaN are the actions value from the ppo, is correct if I clip those values before applying them to the model?
Good job in found out that learning rate seems to be somewhat helpful! Physic dt of 0.005 is usually an ok value, though I do see some of my environment will results in a very poor policy with 0.005 but with much better policy when dt changed to 0.002 or 0.001 .
That said, Nan can either come from environment, or Nan can come from your policy network, you need to know where is the nan come from.
If your nan come from network, that usually indicates if observation is not normalized, reward are too extreme, learning rate too high, exploration too aggressive etc, all the values that matters to learning are unstable and caused the network weight to explode. Smaller learning rate can be helpful, but usually those default one can be pretty good. If you found need much smaller learning rate than default to work, it may suggest other problems.
If your nan come from environment, (like the robot disappear), that mean, physx solutions diverge. It can be that robot action is too aggressive(so the solution diverge), it can be that update rate is too large(so the solution diverge), and that can result states reading to be Nan, and resulted in part of observations and rewards that depends on those states becoming Nan...
These issue are hard debug in nature, and forming an educated guess is crucial in diagnosing problem.
you describe that smaller learning rate can results in less Nan issue makes me wonder, do you have reward that is very large? is your observation normalized? if action rate is too negative due to action explode, maybe try clipping the action rate weight range so that the crazy reward never gets to ppo?
@zoctipus It seems my network is generating the nan, the rewards seems to be not so high (I took the values from the cassie task):
` lin_vel_z_weight = -2.0 ang_vel_xy_weight = -0.05 action_rate_weight = -0.015 undesired_contact_weight = -1.0
# REWARDS
lin_vel_xy_weight = 2.0
ang_vel_z_weight = 1.0
feer_air_weight = 0.125`
The ppo configuration seems not too extereme
rollouts: 16
learning_epochs: 5
mini_batches: 4
discount_factor: 0.99 # gamma
lambda: 0.95
learning_rate: 5.0e-04 # <- get nan also with this value, (changed from 1.0e-03)
Another problem is the model seems not learning, so maybe there is something in here that I'm not able to see (the ppo keep doing the same actions and is not able to improve).
https://github.com/user-attachments/assets/274c14a2-a95e-4a1a-8be4-e0ee3523715a
Hi @AndreaRossetto
You can try clipping the actions explicitly or narrowing the output range (e.g.: using tanh), as discussed in https://github.com/isaac-sim/IsaacLab/pull/984#issuecomment-2391488483
Hi @Toni-SM I tried to clip the PPo with the parameter "clip_actions" but seems not working (I keep getting Nan values)
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: True
clip_log_std: True
is correct if I clip in this way?
def _pre_physics_step(self, actions: torch.Tensor) -> None:
self._actions = actions.clone().clamp(-10.0, 10.0)
self._processed_actions = self.cfg.action_scale * self._actions + self._robot.data.default_joint_pos
Did you also reset your robot velocity? Make sure your reset function is proper. At least in my case a proper reset function is solution. Bad collision also cause robots to disappear.
@celestialdr4g0n my reset function is this, I reset the pose and also the velocities
def _reset_idx(self, env_ids: torch.Tensor | None):
#print("RESET")
if env_ids is None or len(env_ids) == self.num_envs:
env_ids = self._robot._ALL_INDICES
self._robot.reset(env_ids)
super()._reset_idx(env_ids)
if len(env_ids) == self.num_envs:
# Spread out the resets to avoid spikes in training when many environments reset at a similar time
self.episode_length_buf[:] = torch.randint_like(self.episode_length_buf, high=int(self.max_episode_length))
self._actions[env_ids] = 0.0
self._previous_actions[env_ids] = 0.0
self.cmdRange = modify_command_velocity(env = self,
term_name = 'lin_vel_xy_exp',
rew = self.tot_reward,
weight = self.cfg.lin_vel_xy_weight,
max_velocity = [-1.5, 3.0],
interval = 200 * 24,
starting_step = 20000 * 24
)
# Sample new commands, linear velocity x, y and heading
self._commands[env_ids, :3] = torch.zeros_like(self._commands[env_ids, :3]).uniform_(self.cmdRange[0], self.cmdRange[1])
# Sample new commands, body angles (pitch, roll and yaw)
self._commands[env_ids, 3:] = torch.zeros_like(self._commands[env_ids, :3]).uniform_(self.angleRange[0], self.angleRange[1])
self.body[env_ids] = self._robot._data.body_pos_w[env_ids, self._base_id].squeeze(1)
# Reset feet references
#self.feetRef[env_ids] = torch.zeros_like(self.feetRef[env_ids])
# Reset robot state
joint_pos = self._robot.data.default_joint_pos[env_ids]
joint_vel = self._robot.data.default_joint_vel[env_ids]
default_root_state = self._robot.data.default_root_state[env_ids]
default_root_state[:, :3] += self._terrain.env_origins[env_ids]
self._robot.write_root_pose_to_sim(default_root_state[:, :7], env_ids)
self._robot.write_root_velocity_to_sim(default_root_state[:, 7:], env_ids)
self._robot.write_joint_state_to_sim(joint_pos, joint_vel, None, env_ids)
self._robot._external_force_b[env_ids] = 0.0
self._robot._external_torque_b[env_ids] = 0.0
modify_push_force(env = self,
env_ids = self.num_envs,
term_name = "push_robot",
max_velocity = [-3.0, 3.0],
interval = 400 * 24,
starting_step = 10000 * 24)
# Logging
extras = dict()
for key in self._episode_sums.keys():
episodic_sum_avg = torch.mean(self._episode_sums[key][env_ids])
extras["Episode_Reward/" + key] = episodic_sum_avg / self.max_episode_length_s
self._episode_sums[key][env_ids] = 0.0
self.extras["log"] = dict()
self.extras["log"].update(extras)
extras = dict()
extras["Episode_Termination/base_contact"] = torch.count_nonzero(self.reset_terminated[env_ids]).item()
extras["Episode_Termination/time_out"] = torch.count_nonzero(self.reset_time_outs[env_ids]).item()
self.extras["log"].update(extras)
I've noticed one strange thing during the training: this at the beginning, seems ok (maybe), the PPO tries to do something
https://github.com/user-attachments/assets/a9b7484b-4ef3-4180-a6cf-306f5adee4e1
After a while of training it does this:
https://github.com/user-attachments/assets/a265f7d0-b507-4f90-802b-f06a6b85a29a
I don't know if this is a my impressions but seems that the gravity is not correct, am I wrong?
Yes, there it is. I mean zero the velocity, I do not know why using the default velocity does not work. Bellow is my code root_state = self._robot.data.default_root_state.clone()[env_ids] root_state[:, 0:3] += self.scene.env_origins[env_ids] root_vel_zeros = torch.zeros_like(root_state[:, 7:]) self._robot.write_root_pose_to_sim(root_state[:, 0:7], env_ids=env_ids) self._robot.write_root_velocity_to_sim(root_vel_zeros, env_ids=env_ids)
Thank you @celestialdr4g0n, I changed the reset part like you mention and seems better but I still have that strange movement
https://github.com/user-attachments/assets/d376a054-5354-400b-ab72-943435471603
I have no idea why it is in this way. But you are working with your own robot so make your functions do their jobs .
Can you really control each joint of your robot as you want?(make sure the output of the policy "go into" your robot properly)
In addition 40 envs is a small number. If you trained with larger num of envs then just ignore this suggestion.
I'm trying with a large number of envs (100/200), I see the ppo is moving the all the joints but sometimes seems it moves the joints one time and then it waits the robot death. The next step maybe repeats the same movements
I've done some stpes forwared, so now at least seems starting working... I think still some problems (The fps are really low, with 400 envs)
https://github.com/user-attachments/assets/ca592a5d-ecf4-4ea8-9760-f821de84de0b
Hi @AndreaRossetto , I'm facing the exact same issue but with DDPG from SKRL (for training a quadruped). Following this issue i tried to fix my code but without any success. Can we have a quick chat/call about it? (in/pietro-dardano) Thanks :)
@pietrodardano yes we can have a chat or a call what you prefer
Thank you for following up on this issue. I will move the post to our Discussions for now.