datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Document how to use TFDS on Colab with TPU

Open danieljanes opened this issue 6 years ago • 12 comments

What I need help with / What I was wondering When trying to use TFDS on Google Colab with TPU acceleration, there's the following exception:

UnimplementedError: File system scheme '[local]' not implemented

What I've tried so far From e.g. https://cloud.google.com/tpu/docs/quickstart one can see that TPUs expect data to be stored on GCS.

However there are examples using Keras+TPU on Colab which load data via tf.keras.datasets, such as: https://colab.research.google.com/gist/ceshine/f196d6b030adb1ec3a8d0b50642709dc/keras-fashion-mnist-tpu.ipynb

It would be nice if... ...there was documentation on how to use TFDS with Keras using TPU on Colab.

danieljanes avatar Apr 19 '19 11:04 danieljanes

Good idea. Did you try setting data_dir to a GCS bucket?

On Fri, Apr 19, 2019 at 4:24 AM Daniel J. Beutel [email protected] wrote:

What I need help with / What I was wondering When trying to use TFDS on Google Colab with TPU acceleration, there's the following exception:

UnimplementedError: File system scheme '[local]' not implemented

What I've tried so far From e.g. https://cloud.google.com/tpu/docs/quickstart one can see that TPUs expect data to be stored on GCS.

However there are examples using Keras+TPU on Colab which load data via tf.keras.datasets, such as:

https://colab.research.google.com/gist/ceshine/f196d6b030adb1ec3a8d0b50642709dc/keras-fashion-mnist-tpu.ipynb

It would be nice if... ...there was documentation on how to use TFDS with Keras using TPU on Colab.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/486, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIQMWY3ZPJJLH2VBSBXHKLPRGTWXANCNFSM4HHDSHQQ .

rsepassi avatar Apr 22 '19 18:04 rsepassi

Thanks @rsepassi, I created a GCS bucket and I'm passing the bucket identifier gs://... to TFDS using data_dir. It's returning a 401 though:

tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "required",
    "message": "Anonymous caller does not have storage.objects.get access to [...]/cifar10.",
    "locationType": "header",
    "location": "Authorization"
   }
  ],
  "code": 401,
  "message": "Anonymous caller does not have storage.objects.get access to [...]/cifar10."
 }
}
'
         when reading metadata of gs://ox-dnn-tpu/cifar10

I'd guess it's related to authentication and permissions on the GCS bucket, I'm not quite sure how to set these up in a way that TFDS can use it behind the scenes.

danieljanes avatar Apr 23 '19 19:04 danieljanes

Thanks for this. The issue seems to be that the machine you're using doesn't have permissions to access the GCS bucket you created. Could you try following this https://cloud.google.com/tpu/docs/storage-buckets and see if it works?

On Tue, Apr 23, 2019 at 12:39 PM Daniel J. Beutel [email protected] wrote:

Thanks @rsepassi https://github.com/rsepassi, I created a GCS bucket and I'm passing the bucket identifier gs://... to TFDS using data_dir. It's returning a 401 though:

tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{ "error": { "errors": [ { "domain": "global", "reason": "required", "message": "Anonymous caller does not have storage.objects.get access to [...]/cifar10.", "locationType": "header", "location": "Authorization" } ], "code": 401, "message": "Anonymous caller does not have storage.objects.get access to [...]/cifar10." } } ' when reading metadata of gs://ox-dnn-tpu/cifar10

I'd guess it's related to authentication and permissions on the GCS bucket, I'm not quite sure how to set these up in a way that TFDS can use it behind the scenes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/datasets/issues/486#issuecomment-485945123, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIQMW75NCFYU2MRIPTDIL3PR5QVLANCNFSM4HHDSHQQ .

rsepassi avatar Apr 23 '19 21:04 rsepassi

@rsepassi this doc references a so-called project number:

https://cloud.google.com/tpu/docs/storage-buckets#locate_the_service_account

How can we see the project number for Colab? Also, is the project number stable accross different runs?

danieljanes avatar May 08 '19 15:05 danieljanes

I am running this code from my tf 2.1.0 docker on ubuntu 16.04 and my host machine directory is mounted inside the docker. Is there any way that i can use host volume mounted inside docker instead of GCS bucket? @rsepassi Any help that i can receive in this regard? There are 100000s of deeplearning beginners in Indian Universities who doesn't have access to GCS and they need to use it easily and beyond googlecolab limits. if this issue can be solved then it will help me publish content for their access.

puneetjindal avatar Feb 27 '20 07:02 puneetjindal

Hi, @danieljanes : I suffered these problems too, and here's solution of my case:

  1. when you facing the Anonymous caller problem, you need to get your account authenticated.
from google.colab import auth
auth.authenticate_user()
  1. when your account authenticated, you can test if you can get data from gcs:
!gsutil ls gs://[BUCKET_NAME]
  1. when you can get data from gcs, you should enable TPU auth:
gsutil acl ch -u [SERVICE_ACCOUNT]:READER gs://[BUCKET_NAME]
gsutil acl ch -u [SERVICE_ACCOUNT]:WRITER gs://[BUCKET_NAME]

where the SERVICE_ACCOUNT is service-[PROJECT_NUMBER]@cloud-tpu.iam.gserviceaccount.com

  1. The PROJECT_NUMBER for Colab is contained in the error message of the response when TPU is not authenticated, which should be:
service-[PROJECT_NUMBER]@cloud-tpu.iam.gserviceaccount.com does not have storage.objects.get access to ....

So, first run your TPU service without TPU auth, waiting for the error message from the response, then Enable TPU auth.

  1. In my case, I need to enable auth for every files in subfolders:
gsutil acl ch -u [SERVICE_ACCOUNT]:READER gs://[BUCKET_NAME]/subfolder/*.tfrec

Sincerely Hope this can help you.

ValleyZw avatar Mar 10 '20 04:03 ValleyZw

Hi @ValleyZw , thanks for getting back to me about this! I'll give it a shot next time I work on Colab.

danieljanes avatar Mar 10 '20 12:03 danieljanes

I was able to execute on cloud v3 TPUs using local files. An example here: https://github.com/sayakpaul/Generating-categories-from-arXiv-paper-titles/blob/master/TPU_Experimentation.ipynb.

sayakpaul avatar Mar 31 '20 10:03 sayakpaul

Hi @ValleyZw : I can use gsutil like !gsutil ls gs://reu/data/ get:

gs://reu/data/
gs://reu/data/corpus.0.tfrecord
gs://reu/data/corpus.1.tfrecord

But I don't know why I can use python to read the file:

with open("gs://reu/data/corpus.0.tfrecord", 'r') as f:
    print(f)

And use in my code is fail。

corpus_paths = [
    f'gs://reu/data/corpus.{i}.tfrecord' for i in range(10)
]

Please help me! Thanks! And I find os.path is not exist: os.path.exists('gs://reu/data/') is False

aigonna avatar Sep 18 '21 14:09 aigonna

You can use TFDS pathlib-like API which works with GCS paths:

path = tfds.core.as_path('gs://reu/data/corpus.0.tfrecord')
with path.open('rb') as f:
  pass

content = path.read_bytes()

assert path.exists()
assert path.name == 'corpus.0.tfrecord'

See https://docs.python.org/3/library/pathlib.html to learn more about pathlib.

Conchylicultor avatar Sep 20 '21 08:09 Conchylicultor

@Conchylicultor Thanks!

aigonna avatar Sep 24 '21 10:09 aigonna

Thanks @ValleyZw , your solution worked for me. However, I found that I had to restart the runtime after adding the authentication in order to get everything working properly.

Some sort of documentation or improved error messages would definitely be helpful, since it took a few Google searches to end up here and get a solution. I did try https://cloud.google.com/tpu/docs/storage-buckets , but it didn't resolve the 401 error for me. Maybe it just needed a restart though.

TylerADavis avatar Jul 20 '22 21:07 TylerADavis