datasets icon indicating copy to clipboard operation
datasets copied to clipboard

tfds.data_source errors

Open fengyang0317 opened this issue 2 years ago • 1 comments

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description tfds.data_source does not work as intended.

Environment information

  • Operating System: google colab

  • Python version: google colab

  • tensorflow-datasets/tfds-nightly version: 4.9.3.dev202309300044

  • tensorflow/tf-nightly version: 2.15.0.dev20230929

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yes

Reproduction instructions There are 5 errors in the colab. https://colab.research.google.com/drive/1QoIYKYUeM00VeVTElCfnSrQdQL0vfAyQ?usp=sharing

Expected behavior run the colab without error

Additional context

fengyang0317 avatar Sep 30 '23 01:09 fengyang0317

Hi @fengyang0317, thanks for your interest in tfds.data_source. This API is meant to use random access (see the tutorial). There is currently no implementation of random access for GCS. If you have an efficient C++ implementation, please feel free to contribute to array_record.

We should provide a clearer error message, so I'll work on this.

In the meantime, you would have to first prepare the data locally before using tfds.data_source.

marcenacp avatar Oct 02 '23 07:10 marcenacp