Deploying openpi0 on jetson orin encountered an error
code was stuck
Without Docker
Terminal window 1:
uv run examples/simple_client/main.py --env DROID
Terminal window 2:
uv run scripts/serve_policy.py --env DROID
uv run scripts/serve_policy.py --env DROID will stuck at ~/openpi-main/src/openpi/models/model.py
with ocp.PyTreeCheckpointer() as ckptr:
metadata = ckptr.metadata(params_path)
item = {"params": metadata["params"]}
params = ckptr.restore(
params_path,
ocp.args.PyTreeRestore(
item=item,
restore_args=jax.tree.map(
lambda _: ocp.ArrayRestoreArgs(sharding=sharding, restore_type=restore_type, dtype=dtype), item
),
),
)["params"]
there are some info INFO:root:Loading model... INFO:2025-03-17 19:44:58,699:jax._src.xla_bridge:945: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:2025-03-17 19:44:58,708:jax._src.xla_bridge:945: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory [CudaDevice(id=0)] Orin INFO:absl:orbax-checkpoint version: 0.11.1 INFO:absl:Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None
please help me ,thx a lot
Encountered same problem
The same problem I got.
ditto here, which orin is everyone using? has anyone seen a guide thats worked?
XLA_PYTHON_CLIENT_PREALLOCATE=false uv run scripts/serve_policy.py
XLA_PYTHON_CLIENT_PREALLOCATE=false uv run scripts/serve_policy.py
Did this solve the issue? FYI, the original post doesn't have any errors, only INFO logs.
Dear all, please leave a comment if the above solution solved the issue.
As Jetson Orin have a unified memory for CPU and GPU, preallocating a large amount of GPU memory can make Orin busy swapping memory. Therefore, setting the preallocation to false solved the issue in my setting.
Dear all, please leave a comment if the above solution solved the issue.
No, this didn't solve the problem in my case. I am currently working on a Jetson Orin Nano Super (8G) kit, and I have successfully done the no-robot inference on my server; however on the jetson, it will stuck a bit (seems that it's a OOM, as my vscode is killed sometimes)
INFO:absl:[process=0][thread=MainThread] No metadata found for any process_index, checkpoint_dir=/home/fjn-jetson/workdir/openpi-assets/checkpoints/pi0_fast_droid/params. time elapsed=0.0024247169494628906 seconds. If the checkpoint does not contain jax.Array then it is expected. If checkpoint contains jax.Array then it should lead to an error eventually; if no error is raised then it is a bug.
after this, there is no message and the terminal begins a new line, but before that they are same to where I ran on my A800 server.
same issue
It's the JAX issue, kinda not compatible with Jetson. https://github.com/jax-ml/jax. I make it work on AGX orin with cpu, but haven't figured to work with GPU.
I'll give https://forums.developer.nvidia.com/t/questions-to-install-jax-0-3-25-on-orin/332829/3 a try
Hello, our team has successfully deployed the pi0.5 model on a Jetson Orin 64GB DK + JetPack 6, with a single inference time of approximately 1 second. Here is our blog link; we hope it helps you. Unfortunately, we currently have no plans to write an English version of the blog. You can use your browser's translation function. If you encounter any problems, you can leave a message here.
https://blog.csdn.net/nenchoumi3119/article/details/154258492?spm=1001.2014.3001.5502