running on a Jetson Orin NX
Good morning,
I have been trying to make the exo project work on my Orin NX without success, here is the error I am getting when running exo:
(exo) sgoudelis@jetson:~/projects/exo$ exo
Selected inference engine: None
_____ _____
/ _ \ \/ / _ \
| __/> < (_) |
\___/_/\_\___/
Detected system: Linux
Inference engine name after selection: tinygrad
Traceback (most recent call last):
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 25, in importlib_load_entry_point
return next(matches).load()
^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/metadata/__init__.py", line 205, in load
module = import_module(match.group('module'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 999, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "/home/sgoudelis/projects/exo/exo/main.py", line 106, in <module>
inference_engine = get_inference_engine(inference_engine_name, shard_downloader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/inference/inference_engine.py", line 69, in get_inference_engine
from exo.inference.tinygrad.inference import TinygradDynamicShardInferenceEngine
File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/inference.py", line 4, in <module>
from exo.inference.tinygrad.models.llama import Transformer, TransformerShard, convert_from_huggingface, fix_bf16, sample_logits
File "/home/sgoudelis/projects/exo/exo/inference/tinygrad/models/llama.py", line 2, in <module>
from tinygrad import Tensor, Variable, TinyJit, dtypes, nn, Device
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/__init__.py", line 5, in <module>
from tinygrad.tensor import Tensor # noqa: F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/tensor.py", line 12, in <module>
from tinygrad.device import Device, BufferSpec
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 226, in <module>
class CPUProgram:
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/tinygrad/device.py", line 227, in CPUProgram
helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'kernel32' if sys.platform == "win32" else 'gcc_s'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/ctypes/__init__.py", line 379, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: invalid ELF header
looking into the so file I get this:
(exo) sgoudelis@jetson:~/projects/exo$ file /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so: ASCII text
(exo) sgoudelis@jetson:~/projects/exo$ more /home/sgoudelis/miniconda3/envs/exo/lib/libgcc_s.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library. */
GROUP ( libgcc_s.so.1 -lgcc )
Anyone had any idea how to make exo work on the Orin Jetson ?
UPDATE:
Moving the mentioned static object file out of the way actually makes exo go further. It does fail in another way:
Traceback (most recent call last):
File "/home/sgoudelis/miniconda3/envs/exo/bin/exo", line 33, in <module>
sys.exit(load_entry_point('exo', 'console_scripts', 'exo')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/main.py", line 385, in run
loop.run_until_complete(main())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/sgoudelis/projects/exo/exo/main.py", line 349, in main
await node.start(wait_for_peers=args.wait_for_peers)
File "/home/sgoudelis/projects/exo/exo/orchestration/node.py", line 59, in start
self.device_capabilities = await device_capabilities()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 153, in device_capabilities
return await linux_device_capabilities()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/projects/exo/exo/topology/device_capabilities.py", line 188, in linux_device_capabilities
gpu_memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 2934, in nvmlDeviceGetMemoryInfo
_nvmlCheckReturn(ret)
File "/home/sgoudelis/miniconda3/envs/exo/lib/python3.12/site-packages/pynvml.py", line 979, in _nvmlCheckReturn
raise NVMLError(ret)
pynvml.NVMLError_NotSupported: Not Supported
I am a complete noob when it comes to NVIDIA CUDA stuff btw. I am guessing this happens because the Orin has shared memory.
ANOTHER UPDATE:
Exo does work with the Orin NX 16GB, by bypassing the part of the code is querying the VRAM amount and giving it a bogus number does make exo boot up just fine and also have GPU accelerated inference.
I would love for some feedback from one of the developers of the Exo project about this. Please feel free to comment.
I worked on tow xavier AGX, exo cannot work, need libnvidia-ml.so.1
Detected system: Linux Inference engine name after selection: tinygrad Using inference engine: TinygradDynamicShardInferenceEngine with shard downloader: SingletonShardDownloader [] Chat interface started:
- http://192.168.1.15:52415
- http://172.17.0.1:52415
- http://127.0.0.1:52415 ChatGPT API endpoint served at:
- http://192.168.1.15:52415/v1/chat/completions
- http://172.17.0.1:52415/v1/chat/completions
- http://127.0.0.1:52415/v1/chat/completions has_read=True, has_write=True Traceback (most recent call last): File "/home/-----/exo/.venv/lib/python3.12/site-packages/pynvml.py", line 2248, in _LoadNvmlLibrary nvmlLib = CDLL("libnvidia-ml.so.1") ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/-----/miniconda3/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
Same error "invalid ELF header" on WSL2
I successfully run exo on jetson orin nx 16gb using "source install.sh", and "pip install -e ." in conda-python3.12 will occur the same error about "anaconda3/envs/exo/lib/libgcc_s.so: invalid ELF header"。
What's more, the another error is "pynvml.NVMLError_NotSupported: Not Supported" that is because the jetson devices do not support "pynvml". It's necessary to change the functions that get the gpu and memory information of jetson devices, and it's not so difficult.
我在 jetson orin nx 16gb 上使用“source install.sh”成功运行了 exo,在 conda-python3.12 中“pip install -e ”将出现相同的错误“anaconda3/envs/exo/lib/libgcc_s.so: invalid ELF header”。
此外,另一个错误是 “pynvml.NVMLError_NotSupported:不支持“,这是因为 Jetson 设备不支持”pynvml”。需要更改获取 jetson 设备的 gpu 和内存信息的函数,而且没有那么难。
try:
with open("/proc/device-tree/compatible") as f:
compatible = f.read().lower()
if "tegra194" in compatible:
gpu_name = "XAVIER"
elif "tegra210" in compatible:
gpu_name = "TX1"
elif "tegra186" in compatible:
gpu_name = "TX2"
elif "tegra234" in compatible:
gpu_name = "Jetson_NX"
else:
gpu_name = "JETSON_GPU"
with open("/proc/meminfo") as f:
for line in f:
if "MemTotal" in line:
total_mem = int(line.split()[1]) * 1024
break
else:
total_mem = 0
gpu_memory_info = type('',(object,),{"total": total_mem})()
Here comes a problem, After I downloaded the llama3.2-8B,It can be loaded in the memory. But it was killed then.
这里来了一个问题,我下载了 llama3.2-8B 之后,就可以加载到内存中了。但它在那时被杀死了。
I run exo on two nodes successfully, but the model seems to be loaded in memory twice or more. The inference time is too long, and the speed of llama3.2:1b is just 2tokens/s.
@Mr-lwd yes, it loaded at least twice, the speed of llama3.2-3b, very very slow.