load_dataset() Fails with NotImplementedError Due to LocalFileSystem Cache in Colab
load_dataset("argilla/synthetic-concise-reasoning-sft-filtered") raises a NotImplementedError in Colab due to incompatible local cache handling.
I'm running into the following issue when trying to load the dataset:
Error: NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
Reproduction steps: from datasets import load_dataset ds = load_dataset("argilla/synthetic-concise-reasoning-sft-filtered")
Followed https://github.com/google-deepmind/gemma/issues/260 to resolve dependencies.
Also got following errors, from these installs ! pip3 install ai-edge-torch-nightly==0.6.0.dev20250605 ! pip3 install ai-edge-litert==1.3.0 ! pip3 install mediapipe==0.10.21
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible. ydf 0.12.0 requires protobuf<6.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible. grpcio-status 1.71.0 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible. tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.
Hi @kartmpk ,
Welcome to Google Gemma family of open source models, the above error occur due to the version compatibility issue between datasets and fsspec libraries. fsspec requires to be in specific version (fsspec==2023.9.2) to avoid the conflict. and also please make sure your dataset library up-to date and clear your cache before downloading the dataset to avoid any cache related issues.
I have reproduced the issue in my local Colab and also successfully able to solve the issue. Please find the attached gist file for your reference.
Thanks.
@Balakrishna-Chennamsetti thanks for the feedback
transformers requires fsspec>=2023.5.0, while gcsfs strictly requires fsspec==2025.3.2, causing a version conflict. Pip cannot resolve this mismatch, leading to broken dependencies during installation.
can you share the working gist of successful installation of all these dependencies?
!pip3 install --upgrade -q -U bitsandbytes peft trl accelerate datasets fsspec==2023.9.2 !pip3 install git+https://github.com/huggingface/[email protected] !pip3 install git+https://github.com/google-ai-edge/ai-edge-torch !pip3 install ai-edge-litert !pip3 install mediapipe
@Balakrishna-Chennamsetti following up if you had any thoughts.
@Balakrishna-Chennamsetti following up, can you share a example gist that works ?
The Hugging Face Gemma/TRL stack (requires PyTorch, transformers<4.51, protobuf==4.x) is incompatible with TensorFlow/MediaPipe/AI-Edge stack (which requires protobuf>=5.x and often overwrites dependencies), making them unsafe to use together in a single Colab runtime.
am I missing anything here?
Hi @kartmpk ,
The Gemma models requires latest version of transformers, PyTorch atleast for the latest Gemma models to work. I'll check whether there any possibility to get all the libraries/packages to be in compatible version to each other or not. Your continuous interest and patience is really appreciated.
Thanks.
Hi @kartmpk ,
Thanks for your patience, Please find the attached gist file, which is having the all the necessary dependencies installation without any conflicts.
Please let me know if you required any additional assistance.
Thanks.