gemma load_dataset() Fails with NotImplementedError Due to LocalFileSystem Cache in Colab

load_dataset("argilla/synthetic-concise-reasoning-sft-filtered") raises a NotImplementedError in Colab due to incompatible local cache handling.

I'm running into the following issue when trying to load the dataset:

Error: NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

Reproduction steps: from datasets import load_dataset ds = load_dataset("argilla/synthetic-concise-reasoning-sft-filtered")

Followed https://github.com/google-deepmind/gemma/issues/260 to resolve dependencies.

Also got following errors, from these installs ! pip3 install ai-edge-torch-nightly==0.6.0.dev20250605 ! pip3 install ai-edge-litert==1.3.0 ! pip3 install mediapipe==0.10.21

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible. ydf 0.12.0 requires protobuf<6.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible. grpcio-status 1.71.0 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible. tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.

Jun 23 '25 02:06 kartmpk

Hi @kartmpk ,

Welcome to Google Gemma family of open source models, the above error occur due to the version compatibility issue between datasets and fsspec libraries. fsspec requires to be in specific version (fsspec==2023.9.2) to avoid the conflict. and also please make sure your dataset library up-to date and clear your cache before downloading the dataset to avoid any cache related issues.

I have reproduced the issue in my local Colab and also successfully able to solve the issue. Please find the attached gist file for your reference.

Thanks.

Jun 23 '25 08:06 Balakrishna-Chennamsetti

@Balakrishna-Chennamsetti thanks for the feedback

transformers requires fsspec>=2023.5.0, while gcsfs strictly requires fsspec==2025.3.2, causing a version conflict. Pip cannot resolve this mismatch, leading to broken dependencies during installation.

can you share the working gist of successful installation of all these dependencies?

!pip3 install --upgrade -q -U bitsandbytes peft trl accelerate datasets fsspec==2023.9.2 !pip3 install git+https://github.com/huggingface/[email protected] !pip3 install git+https://github.com/google-ai-edge/ai-edge-torch !pip3 install ai-edge-litert !pip3 install mediapipe

Jun 23 '25 15:06 kartmpk

@Balakrishna-Chennamsetti following up if you had any thoughts.

Jun 30 '25 03:06 kartmpk

@Balakrishna-Chennamsetti following up, can you share a example gist that works ?

The Hugging Face Gemma/TRL stack (requires PyTorch, transformers<4.51, protobuf==4.x) is incompatible with TensorFlow/MediaPipe/AI-Edge stack (which requires protobuf>=5.x and often overwrites dependencies), making them unsafe to use together in a single Colab runtime.

am I missing anything here?

Jul 01 '25 04:07 kartmpk

Hi @kartmpk ,

The Gemma models requires latest version of transformers, PyTorch atleast for the latest Gemma models to work. I'll check whether there any possibility to get all the libraries/packages to be in compatible version to each other or not. Your continuous interest and patience is really appreciated.

Thanks.

Jul 02 '25 02:07 Balakrishna-Chennamsetti

Hi @kartmpk ,

Thanks for your patience, Please find the attached gist file, which is having the all the necessary dependencies installation without any conflicts.

Please let me know if you required any additional assistance.

Thanks.

Jul 07 '25 07:07 Balakrishna-Chennamsetti