openpi icon indicating copy to clipboard operation
openpi copied to clipboard

dataset format versioning

Open edgarriba opened this issue 4 months ago • 7 comments

hi, what's the latest supported version for lerobot dataset ? they have been changing the format and now they are in 3.0.

edgarriba avatar Oct 02 '25 14:10 edgarriba

+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.

The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.

Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.

Failure Mode 1: Crash with Latest Dependencies

A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/openpi/src/openpi/training/data_loader.py", line 141, in create_torch_dataset dataset = lerobot_dataset.LeRobotDataset(...) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 508, in init timestamps = torch.stack(self.hf_dataset["timestamp"]).numpy() TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

Failure Mode 2: Crash with Downgraded datasets Library

To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 620, in load_hf_dataset hf_dataset = load_dataset("parquet", data_dir=path, split="train") File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/datasets/features/features.py", line 1474, in generate_from_dict raise ValueError(f"Feature type '{_type}' not found...") ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', ...]

Environment Python: 3.11

Platform: Linux (Slurm HPC Cluster)

Installation: uv sync / pip

Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)

This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.

Alegunm avatar Oct 04 '25 14:10 Alegunm

Do you slove this problem? I also meet this problem.

xuxiaoxxxx avatar Oct 07 '25 16:10 xuxiaoxxxx

Is this issue fixed? I was facing the same issue. Are there any work arounds?

jj701 avatar Oct 22 '25 03:10 jj701

I also encountered the same problem, where the dataset version 2.0 or 3.0 cannot be directly used, but version 2.1 has been tested and is acceptable.

sunmoon2018 avatar Oct 27 '25 08:10 sunmoon2018

+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.

The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.

Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.

Failure Mode 1: Crash with Latest Dependencies A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/openpi/src/openpi/training/data_loader.py", line 141, in create_torch_dataset dataset = lerobot_dataset.LeRobotDataset(...) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 508, in init timestamps = torch.stack(self.hf_dataset["timestamp"]).numpy() TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column

Failure Mode 2: Crash with Downgraded datasets Library To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.

Traceback:

Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 620, in load_hf_dataset hf_dataset = load_dataset("parquet", data_dir=path, split="train") File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/datasets/features/features.py", line 1474, in generate_from_dict raise ValueError(f"Feature type '{_type}' not found...") ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', ...]

Environment Python: 3.11

Platform: Linux (Slurm HPC Cluster)

Installation: uv sync / pip

Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)

This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.

Same as Failure Mode 2 here. Does downgrading lerobot work?

Virlus avatar Oct 28 '25 03:10 Virlus

It seems that downgrading to datasets==3.6.0 works for me.

Virlus avatar Oct 28 '25 05:10 Virlus

Is this error solved? My dataset is currently in version 3.0.

exaFLOPs26 avatar Nov 28 '25 08:11 exaFLOPs26