IsaacLab icon indicating copy to clipboard operation
IsaacLab copied to clipboard

[Bug Report] LLVM ERROR: out of memory

Open thkkk opened this issue 1 year ago • 5 comments

Describe the bug

I'm running Orbit training and it's reporting an error, LLVM ERROR: out of memory. However, I have plenty of RAM, 128GB, and the Isaac sim will only take up 8GB of my RAM.

Steps to reproduce

I'm not sure if it's possible to reproduce it on someone else's Orbit to reproduce it(Because I've done a lot of local extensions of other robots on Orbit). The command I used was python source/standalone/workflows/rsl_rl/train.py --task Isaac-Lift-Cube-Franka-IK-Abs-v0 --headless --num_envs 2

I uploaded the kit log, the runtime log, and the dmesg information.

training_log_and_dmesg.md kit_20240522_161807.log

-->

System Info

Describe the characteristic of your environment:

  • Commit: our branch is based on e444df7d75094dfd1339e9bfea7e2ac24448c4f4
  • Isaac Sim Version: 2023.1.1
  • OS: Ubuntu 24.04
  • GPU: RTX 4090
  • CUDA: 12.4
  • GPU Driver: 550.67

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)
  • [x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

Add the criteria for which this task is considered done. If not known at issue creation time, you can add this once the issue is assigned.

  • [ ] Criteria 1
  • [ ] Criteria 2

thkkk avatar May 23 '24 09:05 thkkk

I also encountered the same problem

Index10808 avatar May 25 '24 13:05 Index10808

Can you switch your GPU driver from 550.67 to 535.171.04 instead? Most likely the problem seems to be coming from there.

Mayankm96 avatar Jun 04 '24 11:06 Mayankm96

Can you switch your GPU driver from 550.67 to 535.171.04 instead? Most likely the problem seems to be coming from there.

Thanks for your suggestions!

  1. My friend uses nvidia driver 550.67 and there is no error.
  2. I have now switched to nvidia driver 535.171.04 and there is no LLVM out of memory error, but there are other errors. I will let you know if I succeed in running it later.

thkkk avatar Jun 05 '24 03:06 thkkk

Can you switch your GPU driver from 550.67 to 535.171.04 instead? Most likely the problem seems to be coming from there.

Thanks for your suggestions!

1. My friend uses nvidia driver 550.67 and there is no error.

2. I have now switched to nvidia driver 535.171.04 and there is no `LLVM out of memory` error, but there are other errors. I will let you know if I succeed in running it later.

Hi @thkkk Did you solved the problem? :) Is it really the problem of Nvidia Driver Version?

Bariona avatar Jul 09 '24 06:07 Bariona

Can you switch your GPU driver from 550.67 to 535.171.04 instead? Most likely the problem seems to be coming from there.

Thanks for your suggestions!

1. My friend uses nvidia driver 550.67 and there is no error.

2. I have now switched to nvidia driver 535.171.04 and there is no `LLVM out of memory` error, but there are other errors. I will let you know if I succeed in running it later.

Hi @thkkk Did you solved the problem? :) Is it really the problem of Nvidia Driver Version?

I solved this problem by reinstalling the entire Ubuntu system and the nvidia driver. The current nvidia driver version is 555.42.02. But I can't determine whether the problem comes from the nvidia driver.

thkkk avatar Jul 15 '24 09:07 thkkk