SimplerEnv Potential memory leak on Maniskill3?

Hi!

Thanks for open-sourcing this awesome project.

Recently, I switched to the maniskill3 branch, and I noticed that I have been getting OOM issues if I switch between too many envs.

My workflow is roughly as follows:

I make env A, do some parallel testing, make sure to close and delete this env by calling env.close() and del env, and make env B, rinse and repeat. All in a single process.

But I noticed that the VRAM does not drop when an env is closed.

I double-checked the time, and the time at which VRAM increases is indeed the time a new env is made.

The specific error message I got is this

RuntimeError: Unable to create GPU parallelized camera group. If the error is about being unable to create a buffer, you are likely using too many Cameras. Either use less cameras (via less parallel envs) and/or reduce the size of the cameras. Another common cause is using a memory intensive shader, you can try using the 'minimal' shader which optimizes for GPU memory but disables some advanced functionalities. Another option is to avoid rendering with the rgb_array mode / using the human render cameras as they can be more memory intensive as they typically have higher resolutions for the purposes of visualization.

I came across this issue on Maniskill's main repo, but it seems like the API to manually clear the assets is no longer in the codebase anymore

Apr 28 '25 16:04 IrvingF7

@StoneT2000

May 02 '25 01:05 xuanlinli17

Could try running

import gc; gc.collect() after deleting the environment?

And what other code are you running besides the environment?

May 02 '25 01:05 StoneT2000

And what version of maniskill 3 is being used? git? pypi? nightly?

May 02 '25 01:05 StoneT2000

Could try running

import gc; gc.collect() after deleting the environment?

And what other code are you running besides the environment?

Hi Stone!

Thanks for the reply. Yes, I did include gc in my code, and that doesn't change the result, which surprises me

The maniskill3 version I was using is 3.0.0b20. I don't quite remember how I installed it tho, sorry.

The code I was running works like this. On the websocket client side, an evaluator object is spawned and queries Simpler/Maniskill3 to get images and robot states. It then packs this data to send to a websocket server

The server is simply a VLA model that accepts input, generates actions, and sends them back to the client/evaluator to execute.

Currently, all things run on one single machine, but the architecture is written so that in the future I can run inference separately from the robot.

The code looks like this

  def evaluate(self):
      '''Run evaluation on all tasks in the task list'''

      for gradient_step in self.gradient_steps:
          self._initialze_model_client(gradient_step)
          for task_name in self.task_lists:
              self.evaluate_task(task_name)

          if self.use_wandb:
              wandb.log(self.wandb_metrics, step=int(gradient_step), commit=True)

  @override
  def evaluate_task(self, task_name):
      '''
      Evaluate a single task

      Args:
          task_name: Name of the task to evaluate

      Returns:
          success_rate: The success rate achieved on this task
      '''
      start_task_time = time.time()
      task_seed = self.seed
      # Initialize task-specific logging
      task_log_dir = self.log_dir / task_name
      video_dir = task_log_dir / "videos"
      if self.main_rank:
          os.makedirs(video_dir, exist_ok=True)

      task_logger = setup_logger(
          main_rank=self.main_rank,
          filename=task_log_dir / f"{task_name}.log" if not self.debug else None,  # log to console when debug is True
          debug=self.debug,
          name=f'{task_name}_logger'
      )

      task_logger.info(f"Task suite: {task_name}")
      self.main_logger.info(f"Task suite: {task_name}")

      # Set up environment
      ms3_task_name = self.ms3_translator.get(task_name, task_name)

      env: BaseEnv = gym.make(
      ms3_task_name,
      obs_mode="rgb+segmentation",
      num_envs=self.n_parallel_eval,
      sensor_configs={"shader_pack": "default"},)

      cnt_episode = 0
      eval_metrics = collections.defaultdict(list)

      # Set up receding horizon control
      action_plan = collections.deque()

      while cnt_episode < self.n_eval_episode:
          task_seed = task_seed + cnt_episode
          obs, _ = env.reset(seed=task_seed, options={"episode_id": torch.tensor([task_seed + i for i in range(self.n_parallel_eval)])})
          instruction = env.unwrapped.get_language_instruction()

          images = []
          predicted_terminated, truncated = False, False
          images.append(get_image_from_maniskill3_obs_dict(env, obs).cpu().numpy())
          elapsed_steps = 0
          while not (predicted_terminated or truncated):
              if not action_plan:
                  # action horizon is all executed
                  # Query model to get action
                  element = {
                          "observation.images.top": images[-1],
                          "observation.state": obs['agent']['eef_pos'].cpu().numpy(),
                          "task": instruction
                          }
                  action_chunk = self.client.infer(element)

                  # action chunk is of the size [batch, action_step, action_dim]
                  # but dequeue can only take something like [action_step, batch, action_dim]
                  action_plan.extend(action_chunk[:, :self.action_step, :].transpose(1, 0, 2))

              action = action_plan.popleft()
              obs, reward, terminated, truncated, info = env.step(action)
              elapsed_steps += 1
              info = common.to_numpy(info)

              truncated = bool(truncated.any()) # note that all envs truncate and terminate at the same time.
              images.append(get_image_from_maniskill3_obs_dict(env, obs).cpu().numpy())

          for k, v in info.items():
              eval_metrics[k].append(v.flatten())

          if self.pipeline_cfg.eval_cfg.recording:
              for i in range(len(images[-1])):
                  # save video. The naming is ugly but it's to follow previous naming scheme
                  success_string = "_success" if info['success'][i].item() else ""
                  images_to_video([img[i] for img in images], video_dir, f"video_{cnt_episode + i}{success_string}", fps=10, verbose=True)

          cnt_episode += self.n_parallel_eval


      mean_metrics = {k: np.mean(v) for k, v in eval_metrics.items()}
      success_rate = mean_metrics['success']
      task_eval_time = time.time() - start_task_time

      # log results
      self._log_summary(logger=task_logger,
                             cnt_episode=cnt_episode,
                             eval_time=task_eval_time,
                             success_rate=success_rate)

      self._log_summary(logger=self.main_logger,
                             cnt_episode=cnt_episode,
                             eval_time=task_eval_time,
                             success_rate=success_rate)

      if self.use_wandb:
          self.wandb_metrics[task_name] = success_rate

      env.close()
      del env
      gc.collect()
      torch.cuda.empty_cache()

May 02 '25 02:05 IrvingF7

I see. Could you try pip uninstall maniskill and then install mani-skill-nightly?

May 02 '25 03:05 StoneT2000

I see. Could you try pip uninstall maniskill and then install mani-skill-nightly?

Thanks. I will report back when I get home and have time to test. For the time being I am using multi-processing to speed up ms2-based simpler

May 02 '25 11:05 IrvingF7