dataset format versioning
hi, what's the latest supported version for lerobot dataset ? they have been changing the format and now they are in 3.0.
+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.
The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.
Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.
Failure Mode 1: Crash with Latest Dependencies
A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.
Traceback:
Traceback (most recent call last):
File "/home/yitong005/openpi/examples/inference.py", line 78, in
Failure Mode 2: Crash with Downgraded datasets Library
To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.
Traceback:
Traceback (most recent call last):
File "/home/yitong005/openpi/examples/inference.py", line 78, in
Environment Python: 3.11
Platform: Linux (Slurm HPC Cluster)
Installation: uv sync / pip
Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)
This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.
Do you slove this problem? I also meet this problem.
Is this issue fixed? I was facing the same issue. Are there any work arounds?
I also encountered the same problem, where the dataset version 2.0 or 3.0 cannot be directly used, but version 2.1 has been tested and is acceptable.
+1, I can confirm this issue. I've done a deep-dive and it seems to be a versioning paradox between the code and the data.
The lerobot code requires an older version of huggingface/datasets to avoid a TypeError, but the dataset on the Hub requires a newer version of huggingface/datasets to avoid a ValueError.
Detailed Analysis I performed a fresh deployment on a Linux HPC cluster and encountered two distinct failure modes.
Failure Mode 1: Crash with Latest Dependencies A fresh install with the latest dependencies (via pip or uv sync) crashes with a TypeError inside the lerobot library when creating the data loader. This suggests an incompatibility with a recent datasets API.
Traceback:
Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/openpi/src/openpi/training/data_loader.py", line 141, in create_torch_dataset dataset = lerobot_dataset.LeRobotDataset(...) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 508, in init timestamps = torch.stack(self.hf_dataset["timestamp"]).numpy() TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not Column
Failure Mode 2: Crash with Downgraded datasets Library To work around the TypeError, I downgraded the library (pip install datasets==3.6.0). This resolved the first error, but introduced a new ValueError when load_dataset tries to parse the dataset's feature schema from the Hub.
Traceback:
Traceback (most recent call last): File "/home/yitong005/openpi/examples/inference.py", line 78, in loader = _data_loader.create_data_loader(config, num_batches=1, skip_norm_stats=True) File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 620, in load_hf_dataset hf_dataset = load_dataset("parquet", data_dir=path, split="train") File "/home/yitong005/.conda/envs/openpi_env/lib/python3.11/site-packages/datasets/features/features.py", line 1474, in generate_from_dict raise ValueError(f"Feature type '{_type}' not found...") ValueError: Feature type 'List' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', ...]
Environment Python: 3.11
Platform: Linux (Slurm HPC Cluster)
Installation: uv sync / pip
Key Libraries: huggingface-datasets (tested latest vs. 3.6.0)
This seems to confirm that the project is currently in a broken state due to this dependency version conflict. Hope this helps track down the issue.
Same as Failure Mode 2 here. Does downgrading lerobot work?
It seems that downgrading to datasets==3.6.0 works for me.
Is this error solved? My dataset is currently in version 3.0.