openpi icon indicating copy to clipboard operation
openpi copied to clipboard

Deploying openpi0 on jetson orin encountered an error

Open zhaodaxiang opened this issue 10 months ago • 11 comments

code was stuck

Without Docker

Terminal window 1:

uv run examples/simple_client/main.py --env DROID

Terminal window 2:

uv run scripts/serve_policy.py --env DROID

uv run scripts/serve_policy.py --env DROID will stuck at ~/openpi-main/src/openpi/models/model.py

    with ocp.PyTreeCheckpointer() as ckptr:
        metadata = ckptr.metadata(params_path)
        item = {"params": metadata["params"]}

        params = ckptr.restore(
            params_path,
            ocp.args.PyTreeRestore(
                item=item,
                restore_args=jax.tree.map(
                    lambda _: ocp.ArrayRestoreArgs(sharding=sharding, restore_type=restore_type, dtype=dtype), item
                ),
            ),
        )["params"]

there are some info INFO:root:Loading model... INFO:2025-03-17 19:44:58,699:jax._src.xla_bridge:945: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' INFO:2025-03-17 19:44:58,708:jax._src.xla_bridge:945: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory [CudaDevice(id=0)] Orin INFO:absl:orbax-checkpoint version: 0.11.1 INFO:absl:Created BasePyTreeCheckpointHandler: pytree_metadata_options=PyTreeMetadataOptions(support_rich_types=False), array_metadata_store=None

please help me ,thx a lot

zhaodaxiang avatar Mar 17 '25 11:03 zhaodaxiang

Encountered same problem

yhuang432 avatar Apr 21 '25 12:04 yhuang432

The same problem I got.

lagrangeluo avatar May 15 '25 03:05 lagrangeluo

ditto here, which orin is everyone using? has anyone seen a guide thats worked?

796F avatar May 16 '25 21:05 796F

XLA_PYTHON_CLIENT_PREALLOCATE=false uv run scripts/serve_policy.py

gmgu avatar Jul 01 '25 08:07 gmgu

XLA_PYTHON_CLIENT_PREALLOCATE=false uv run scripts/serve_policy.py

Did this solve the issue? FYI, the original post doesn't have any errors, only INFO logs.

kvablack avatar Jul 10 '25 17:07 kvablack

Dear all, please leave a comment if the above solution solved the issue.

As Jetson Orin have a unified memory for CPU and GPU, preallocating a large amount of GPU memory can make Orin busy swapping memory. Therefore, setting the preallocation to false solved the issue in my setting.

gmgu avatar Jul 13 '25 00:07 gmgu

Dear all, please leave a comment if the above solution solved the issue.

No, this didn't solve the problem in my case. I am currently working on a Jetson Orin Nano Super (8G) kit, and I have successfully done the no-robot inference on my server; however on the jetson, it will stuck a bit (seems that it's a OOM, as my vscode is killed sometimes)

INFO:absl:[process=0][thread=MainThread] No metadata found for any process_index, checkpoint_dir=/home/fjn-jetson/workdir/openpi-assets/checkpoints/pi0_fast_droid/params. time elapsed=0.0024247169494628906 seconds. If the checkpoint does not contain jax.Array then it is expected. If checkpoint contains jax.Array then it should lead to an error eventually; if no error is raised then it is a bug.

after this, there is no message and the terminal begins a new line, but before that they are same to where I ran on my A800 server.

Muyiyunzi avatar Jul 15 '25 12:07 Muyiyunzi

same issue

fengxiuyaun avatar Jul 30 '25 11:07 fengxiuyaun

It's the JAX issue, kinda not compatible with Jetson. https://github.com/jax-ml/jax. I make it work on AGX orin with cpu, but haven't figured to work with GPU.

wzqvip avatar Sep 25 '25 17:09 wzqvip

I'll give https://forums.developer.nvidia.com/t/questions-to-install-jax-0-3-25-on-orin/332829/3 a try

evelynmitchell avatar Oct 04 '25 19:10 evelynmitchell

Hello, our team has successfully deployed the pi0.5 model on a Jetson Orin 64GB DK + JetPack 6, with a single inference time of approximately 1 second. Here is our blog link; we hope it helps you. Unfortunately, we currently have no plans to write an English version of the blog. You can use your browser's translation function. If you encounter any problems, you can leave a message here.

https://blog.csdn.net/nenchoumi3119/article/details/154258492?spm=1001.2014.3001.5502

GaohaoZhou-ops avatar Nov 02 '25 10:11 GaohaoZhou-ops