SAPIEN [Question] Why does camera.take

Hi, I am using SAPIEN to collect robot data, and I have encountered an issue where the camera.take_picture() function takes around 0.135921 seconds to execute each time, which significantly slows down my program. I am using the following setup for the SAPIEN simulation environment:

def setup_scene(self, 
                    timestep: float = 1 / 150, 
                    ground_height: float = 0.,
                    static_friction: float = 0.5, dynamic_friction: float = 0.5, restitution: float = 0.,
                    ambient_light: list[float] = [0.5, 0.5, 0.5],
                    shadow: bool = True,
                    direction_lights: list[list[float]] = [[[0, 0.5, -1], [0.5, 0.5, 0.5]]],
                    point_lights: list = [[[1, 0, 1.8], [1, 1, 1]], [[-1, 0, 1.8], [1, 1, 1]]],
                    camera_xyz: list[float] = [0.4, 0.22, 1.5], camera_rpy: list[float] = [0, -0.8, 2.45],):
        '''
        Set the scene
            - Set up the basic scene: light source, viewer.
        '''
        self.engine = sapien.Engine()
        # declare sapien renderer
        from sapien.render import set_global_config
        set_global_config(max_num_materials = 50000, max_num_textures = 50000)
        self.renderer = sapien.SapienRenderer()
        # give renderer to sapien sim
        self.engine.set_renderer(self.renderer)
        
        sapien.render.set_camera_shader_dir("rt")
        sapien.render.set_ray_tracing_samples_per_pixel(32)
        sapien.render.set_ray_tracing_path_depth(8)
        sapien.render.set_ray_tracing_denoiser("oidn")

        # declare sapien scene
        scene_config = sapien.SceneConfig()
        self.scene = self.engine.create_scene(scene_config)
        # set simulation timestep
        self.scene.set_timestep(timestep)
        # add ground to scene
        self.scene.add_ground(ground_height)
        # set default physical material
        self.scene.default_physical_material = self.scene.create_physical_material(
            static_friction,
            dynamic_friction,
            restitution,
        )
        # give some white ambient light of moderate intensity
        self.scene.set_ambient_light(ambient_light)
        # default spotlight angle and intensity
        for direction_light in direction_lights:
            self.scene.add_directional_light(
                direction_light[0], direction_light[1], shadow=shadow
            )
        # default point lights position and intensity
        for point_light in point_lights:
            self.scene.add_point_light(point_light[0], point_light[1], shadow=shadow)

        # initialize viewer with camera position and orientation
        if self._render:
            self.viewer = Viewer(self.renderer)
            self.viewer.set_scene(self.scene)
            self.viewer.set_camera_xyz(
                x=camera_xyz[0],
                y=camera_xyz[1],
                z=camera_xyz[2],
            )
            self.viewer.set_camera_rpy(
                r=camera_rpy[0],
                p=camera_rpy[1],
                y=camera_rpy[2],
            )

I have defined multiple cameras using the following code:

def load_camera(self):
        '''
        Add cameras and set camera parameters
        '''
        self.sensor_cameras = dict()

        camera_top = self.scene.add_camera(
            name="camera_top", 
            width=SENSOR_CAMERA_WIDTH, height=SENSOR_CAMERA_HEIGHT, 
            fovy=np.deg2rad(SENSOR_CAMERA_FOVY), 
            near=SENSOR_CAMERA_NEAR, far=SENSOR_CAMERA_FAR)
        camera_top.entity.set_pose(rand_pose(position_reference=[0, 0, 1.5],
                                             x_limit=[-0.01, 0.01], y_limit=[-0.01, 0.01], z_limit=[-0.01, 0.01],
                                             rotation_reference=[0., np.pi / 2, np.pi / 2], 
                                             euler_angles_limit=[np.pi / 36, np.pi / 36, np.pi / 36]))
        self.sensor_cameras["camera_top"] = camera_top

        camera_left = self.scene.add_camera(
            name="camera_left", 
            width=SENSOR_CAMERA_WIDTH, height=SENSOR_CAMERA_HEIGHT, 
            fovy=np.deg2rad(SENSOR_CAMERA_FOVY), 
            near=SENSOR_CAMERA_NEAR, far=SENSOR_CAMERA_FAR)
        camera_left.entity.set_pose(rand_pose(position_reference=[TABLE_LENGTH / 2, 0, TABLE_HEIGHT + 0.075],
                                              x_limit=[-0.01, 0.01], y_limit=[-0.01, 0.01], z_limit=[-0.01, 0.01],
                                              rotation_reference=[0., 0., np.pi],
                                              euler_angles_limit=[np.pi / 36, np.pi / 36, np.pi / 36]))
        self.sensor_cameras["camera_left"] = camera_left

        camera_wrist = self.scene.add_mounted_camera(
            name="camera_wrist",
            mount=self.robot_end_effector_link.entity,
            pose=rand_pose(position_reference=[0.05, 0, 0.025,], 
                           x_limit=[-0.01, 0.01], y_limit=[-0.01, 0.01], z_limit=[-0.01, 0.01],
                           rotation_reference=[0., -np.pi/2, np.pi], 
                           euler_angles_limit=[np.pi / 36, np.pi / 36, np.pi / 36]),
            width=SENSOR_CAMERA_WIDTH, height=SENSOR_CAMERA_HEIGHT,
            fovy=np.deg2rad(SENSOR_CAMERA_FOVY),
            near=SENSOR_CAMERA_NEAR, far=SENSOR_CAMERA_FAR
        )
        self.sensor_cameras["camera_wrist"] = camera_wrist

What could be causing the significant delay in the program, and are there any ways to speed it up? For example, is it possible to configure the camera to selectively capture only certain data? For my code, object surface normals and segmentation results are not needed.

Jan 05 '25 07:01 yilin404

From your code, it seems you are using ray tracing with 32 samples per pixel, which could be quite slow depending on your hardware. For example, at 1K resolution, it takes more than 1 second on my low-end integrated graphics card, so 0.13 second is not surprising. If you need it to run faster, you can switch to rasterization, reduce samples or reduce image resolution. However, if you are using a high-end GPU such as RTX 4090, this problem may indicate that the GPU is not loaded properly and the renderer may be using integrated graphics or sometimes even CPU to execute. You can find this out by sapien.render.set_log_level("info") and look for the selected graphics device when creating the renderer.

Jan 05 '25 13:01 fbxiang

Capturing less data probably will not speed up rendering at all especially for ray tracing. Even for rasterization, not capturing normal and segmentation only starts to make a difference when I render more than 1 billion pixels per second after I optimized everything else to the limit.

Jan 05 '25 13:01 fbxiang

[Question] Why does camera.take_picture() take too long?