openpi dataset format versioning

hi, what's the latest supported version for lerobot dataset ? they have been changing the format and now they are in 3.0.

Oct 02 '25 14:10 edgarriba

+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.

The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.

Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.

Failure Mode 1: Crash with Latest Dependencies

A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/openpi/src/openpi/training/data_loader.py", line 141, in create_torch_dataset dataset = lerobot_dataset.LeRobotDataset(...) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 508, in init timestamps = torch.stack(self.hf_dataset["timestamp"]).numpy() TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

Failure Mode 2: Crash with Downgraded datasets Library

To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 620, in load_hf_dataset hf_dataset = load_dataset("parquet", data_dir=path, split="train") File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/datasets/features/features.py", line 1474, in generate_from_dict raise ValueError(f"Feature type '{_type}' not found...") ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', ...]

Environment Python: 3.11

Platform: Linux (Slurm HPC Cluster)

Installation: uv sync / pip

Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)

This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.

Oct 04 '25 14:10 Alegunm

Do you slove this problem? I also meet this problem.

Oct 07 '25 16:10 xuxiaoxxxx

Is this issue fixed? I was facing the same issue. Are there any work arounds?

Oct 22 '25 03:10 jj701

I also encountered the same problem, where the dataset version 2.0 or 3.0 cannot be directly used, but version 2.1 has been tested and is acceptable.

Oct 27 '25 08:10 sunmoon2018

+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.

The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.

Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.

Failure Mode 1: Crash with Latest Dependencies A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/openpi/src/openpi/training/data_loader.py", line 141, in create_torch_dataset dataset = lerobot_dataset.LeRobotDataset(...) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 508, in init timestamps = torch.stack(self.hf_dataset["timestamp"]).numpy() TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

Failure Mode 2: Crash with Downgraded datasets Library To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 620, in load_hf_dataset hf_dataset = load_dataset("parquet", data_dir=path, split="train") File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/datasets/features/features.py", line 1474, in generate_from_dict raise ValueError(f"Feature type '{_type}' not found...") ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', ...]

Environment Python: 3.11

Platform: Linux (Slurm HPC Cluster)

Installation: uv sync / pip

Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)

This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.

Same as Failure Mode 2 here. Does downgrading lerobot work?

Oct 28 '25 03:10 Virlus

It seems that downgrading to datasets==3.6.0 works for me.

Oct 28 '25 05:10 Virlus

Is this error solved? My dataset is currently in version 3.0.

Nov 28 '25 08:11 exaFLOPs26