SimplerEnv icon indicating copy to clipboard operation
SimplerEnv copied to clipboard

Can it run on a Linux system with A100 GPUs?

Open RZFan525 opened this issue 1 year ago • 30 comments

RZFan525 avatar Jul 25 '24 07:07 RZFan525

Yes. Please follow the instructions in readme and troubleshooting. Though, rendering for the drawer tasks will be slow due to the use of ray tracing.

xuanlinli17 avatar Jul 25 '24 13:07 xuanlinli17

Thank you for your reply. However, I encountered the same error as #7. And, when I install vulkan-utils with sudo apt-get install vulkan-utils, an error appears: The package vulkan-utils could not be located I don't have any computers with RTX GPUs, how can I run it?

RZFan525 avatar Jul 26 '24 02:07 RZFan525

I have followed https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan to add three json files, but it does not work.

RZFan525 avatar Jul 26 '24 03:07 RZFan525

Did you sudo apt update and vulkan-utils is still not found since it's ubuntu 22.04?

Try sudo apt install vulkan-tools

xuanlinli17 avatar Jul 27 '24 20:07 xuanlinli17

Thank you for your reply. I have tried it, and it can be installed successfully. However, the same error has appeared.

And, I found that vulkaninfo works without /usr/share/vulkan/icd.d/nvidia_icd.json, /usr/share/glvnd/egl_vendor.d/10_nvidia.json, and /etc/vulkan/implicit_layer.d/nvidia_layers.json. But, when I follow https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan to manually add these three files, vulkaninfo doesn't work with the error ERROR_OUT_OF_HOST_MEMORY.

Anyway, the following error always appears whether the vulkaninfo can work or not.

[2024-07-28 11:58:15.019] [svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[2024-07-28 11:58:15.019] [svulkan2] [warning] Continue without GLFW.
Traceback (most recent call last):
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/test.py", line 4, in <module>
    env = simpler_env.make('google_robot_pick_coke_can')
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/simpler_env/__init__.py", line 78, in make
    env = gym.make(env_name, obs_mode="rgbd", **kwargs)
  File "/cpfs01/user/liupengfei/rzfan/miniconda3/envs/simpler_env/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 802, in make
    env = env_creator(**env_spec_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 92, in make
    env = env_spec.make(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 34, in make
    return self.cls(**_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 630, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 540, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 64, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/base_env.py", line 134, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/sapien_env.py", line 107, in __init__
    self._renderer = sapien.SapienRenderer(**renderer_kwargs)
RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed

RZFan525 avatar Jul 28 '24 03:07 RZFan525

I don't know how to run it :(

I have tried three different servers with A100 GPUs which encounter the same error. :(

RZFan525 avatar Jul 28 '24 04:07 RZFan525

Are you setting cuda devices properly? Also ensure that nvidia-driver version is at least above 535. Older nvidia drivers might not work.

You can make a fake display like

tmux new -s 1
sudo X :0 &
[exist tmux ctrl-b]
export DISPLAY=:0

xuanlinli17 avatar Jul 29 '24 17:07 xuanlinli17

Thank you for getting back to me.

The servers I used are in a docker and I changed to another server, which makes it work.

However, I encountered another error which is attributed to the lack of display.

RuntimeError: Create window failed: context is not created with present support

Do you have any suggestions to help me observe the environment and the process of action?

RZFan525 avatar Jul 30 '24 03:07 RZFan525

Inside docker, you might want to port the (fake) display (e.g., sudo X :0 &) in the main bash to the docker container

However, the SIMPLER environments shouldn't create a window unless you are visualizing robots using the utility scripts.

xuanlinli17 avatar Jul 30 '24 03:07 xuanlinli17

I'm new in robotics, so I want to visualize the simulation environment to help me understand deeply. Maybe, it's better to output a video.

RZFan525 avatar Jul 30 '24 03:07 RZFan525

The evaluation videos are automatically saved.

xuanlinli17 avatar Jul 30 '24 05:07 xuanlinli17

Thank you!

I can run the scripts scripts/openvla_bridge.sh, but it suddenly reports an error after running for a while.

image

RZFan525 avatar Jul 30 '24 08:07 RZFan525

If you consecutively create 2 environments in ipython, does it still report an error?

xuanlinli17 avatar Jul 30 '24 16:07 xuanlinli17

When I create 2 environments, it can work. But there is a warning:

[2024-07-31 03:20:06.870] [svulkan2] [warning] A second renderer will share the same internal context with the first one. Arguments passed to constructor will be ignored.

RZFan525 avatar Jul 31 '24 03:07 RZFan525

i don't know why. But, I also try SimplerEnv-OpenVLA/scripts/openvla_drawer_variant_agg.sh It's successful to output the average success

image

Thank you!

I can run the scripts scripts/openvla_bridge.sh, but it suddenly reports an error after running for a while.

image

RZFan525 avatar Jul 31 '24 03:07 RZFan525

I find that the error appears when the obj_episode_id is 11 in any scripts that define obj-variation-mode as the episode. image

RZFan525 avatar Aug 01 '24 09:08 RZFan525

That's strange; episode 11 doesn't introduce new objects.

xuanlinli17 avatar Aug 01 '24 13:08 xuanlinli17

Could you give me some instructions on how to debug? Thank you very much!!

RZFan525 avatar Aug 02 '24 05:08 RZFan525

I actually don't know... and sorry that I don't have much bandwidth at the moment to look closely.

xuanlinli17 avatar Aug 02 '24 05:08 xuanlinli17

Ok. Thank you for your reply.

RZFan525 avatar Aug 02 '24 05:08 RZFan525

Also you might create fake display like sudo X :0 &; export DISPLAY=:0 or xvfb-run -a {script}, to see if it works.

xuanlinli17 avatar Aug 02 '24 19:08 xuanlinli17

Thank you. After trying this command, I found it cannot work. The error is the same. I don't know why.

RZFan525 avatar Aug 03 '24 03:08 RZFan525

Hello:

I don't know how to run it :(

I have tried three different servers with A100 GPUs which encounter the same error. :(

Same error in A100 GPU. "libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution? Thanks a lot!

COST-97 avatar Sep 18 '24 07:09 COST-97

Hello:

I don't know how to run it :( I have tried three different servers with A100 GPUs which encounter the same error. :(

Same error in A100 GPU. "libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution? Thanks a lot!

Could you try the troubleshooting section in readme?

xuanlinli17 avatar Sep 18 '24 15:09 xuanlinli17

Same error in A100 GPU. "libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution? Thanks a lot!

Akila-Ayanthi avatar Jan 12 '25 08:01 Akila-Ayanthi

I find that the error appears when the obj_episode_id is 11 in any scripts that define obj-variation-mode as the episode. image

I had similar issue, but when I sudo apt-get install libglvnd-dev, this error disappeared, and it worked.

yinsong1986 avatar Jan 17 '25 03:01 yinsong1986

Did you sudo apt update and vulkan-utils is still not found since it's ubuntu 22.04?

Try sudo apt install vulkan-tools

Thank you for your reply. I have tried it, and it can be installed successfully. However, the same error has appeared.

And, I found that vulkaninfo works without /usr/share/vulkan/icd.d/nvidia_icd.json, /usr/share/glvnd/egl_vendor.d/10_nvidia.json, and /etc/vulkan/implicit_layer.d/nvidia_layers.json. But, when I follow https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan to manually add these three files, vulkaninfo doesn't work with the error ERROR_OUT_OF_HOST_MEMORY.

Anyway, the following error always appears whether the vulkaninfo can work or not.

[2024-07-28 11:58:15.019] [svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[2024-07-28 11:58:15.019] [svulkan2] [warning] Continue without GLFW.
Traceback (most recent call last):
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/test.py", line 4, in <module>
    env = simpler_env.make('google_robot_pick_coke_can')
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv-OpenVLA/simpler_env/__init__.py", line 78, in make
    env = gym.make(env_name, obs_mode="rgbd", **kwargs)
  File "/cpfs01/user/liupengfei/rzfan/miniconda3/envs/simpler_env/lib/python3.10/site-packages/gymnasium/envs/registration.py", line 802, in make
    env = env_creator(**env_spec_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 92, in make
    env = env_spec.make(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/utils/registration.py", line 34, in make
    return self.cls(**_kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 630, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 540, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/grasp_single_in_scene.py", line 64, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/custom_scenes/base_env.py", line 134, in __init__
    super().__init__(**kwargs)
  File "/cpfs01/user/liupengfei/rzfan/SimplerEnv/ManiSkill2_real2sim/mani_skill2_real2sim/envs/sapien_env.py", line 107, in __init__
    self._renderer = sapien.SapienRenderer(**renderer_kwargs)
RuntimeError: vk::Instance::enumeratePhysicalDevices: ErrorInitializationFailed

I tried this as well, I have different error, it hang when create env https://github.com/simpler-env/SimplerEnv/issues/89

LukeLIN-web avatar May 02 '25 01:05 LukeLIN-web

I find that the error appears when the obj_episode_id is 11 in any scripts that define obj-variation-mode as the episode. image

I think this is because it can only create 11 env maxmium , don't know where define it. I also meet same error.

LukeLIN-web avatar May 02 '25 01:05 LukeLIN-web

Same error in A100 GPU. "libGLX_nvidia.so.0" does not exist in the A100.

Does anyone have an updated solution? Thanks a lot!

refer https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan

LukeLIN-web avatar May 02 '25 01:05 LukeLIN-web

A100 GPU 中出现同样的错误。“libGLX_nvidia.so.0”在 A100 中不存在。 有人有更新的解决方案吗?非常感谢!

请参阅https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html#vulkan

I have created the JSON file according to this, but there are still errors. Does anyone have a solution? Thank you!

CostaliyA avatar Jun 02 '25 08:06 CostaliyA