Anonymous access of GCP bucket fails with `ValueError: Anonymous credentials cannot be refreshed.`
Affects modelstore 0.0.74.
To reproduce:
# create a new environment (Python 3.8)
python -m venv env
source env/bin/activate
# install modelstore and GCP CLI
pip install modelstore google-cloud-storage
python
Python 3.8.8 (default, Apr 4 2021, 16:02:17)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
>>> model_store = ModelStore.from_gcloud(bucket_name="xai-demo-models")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 90, in from_gcloud
return ModelStore(
File "<string>", line 4, in __init__
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/model_store.py", line 105, in __post_init__
if not self.storage.validate():
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/modelstore/storage/gcloud.py", line 128, in validate
if not self.bucket.exists():
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/bucket.py", line 843, in exists
client._get_resource(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/client.py", line 366, in _get_resource
return self._connection.api_request(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/storage/_http.py", line 73, in api_request
return call()
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 283, in retry_wrapped_func
return retry_target(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/api_core/retry.py", line 190, in retry_target
return target()
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 482, in api_request
response = self._make_request(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 341, in _make_request
return self._do_request(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/cloud/_http/__init__.py", line 379, in _do_request
return self.http.request(
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/transport/requests.py", line 526, in request
self.credentials.refresh(auth_request)
File "/home/kilian/Documents/Ulm/GitHub/modelstorereplica/env/lib/python3.8/site-packages/google/auth/credentials.py", line 173, in refresh
raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.
I remember encountering and resolving this issue while working on #142. We should have a look at the changes introduced by #161.
Output of pip freeze:
cachetools==5.0.0
certifi==2021.10.8
charset-normalizer==2.0.12
click==8.1.3
gitdb==4.0.9
GitPython==3.1.27
google-api-core==2.7.3
google-auth==2.6.6
google-cloud-core==2.3.0
google-cloud-storage==2.3.0
google-crc32c==1.3.0
google-resumable-media==2.3.2
googleapis-common-protos==1.56.0
idna==3.3
joblib==1.1.0
modelstore==0.0.74
numpy==1.22.3
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.27.1
rsa==4.8
six==1.16.0
smmap==5.0.0
tqdm==4.64.0
urllib3==1.26.9
Thanks for reporting! I'll try and investigate soon, but am away atm. If you spot anything in the 2nd PR you mentioned please let me know
@ionicsolutions Can I use the "xai-demo-models" bucket for testing as well? I'm going to re-run your code above. Otherwise I'll create a testing-only public GCS container.
@nlathia Sure, go ahead and use it for now! It contains one model in one domain.
Just to log my investigation --
When trying to replicate this, the first error I ran into was because I have some environment variables set for GCP (which modelstore retrieves here) and this lead to a slightly different exception:
raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/xai-demo-models?projection=noAcl&prettyPrint=false: <service-account-name> does not have storage.buckets.get access to the Google Cloud Storage bucket.
But when I removed those environment variables, I was able to replicate this:
raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.
Similar errors have been reported here:
- https://github.com/mlflow/mlflow/issues/2925
- https://github.com/googleapis/python-storage/issues/102
I've managed to reproduce this error without modelstore. It is triggered when bucket.exists() is called, which is what we use in modelstore when validate()'ing that the GCP storage can be used.
Python 3.8.12 (default, Mar 24 2022, 23:17:02)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from google.cloud import storage
>>> bucket_name = "xai-demo-models"
>>> client = storage.Client.create_anonymous_client()
>>> bucket = client.bucket(bucket_name=bucket_name)
>>> bucket.exists()
[...]
File "/Users/neallathia/.pyenv/versions/modelstore-dev-3-8-12/lib/python3.8/site-packages/google/auth/credentials.py", line 173, in refresh
raise ValueError("Anonymous credentials cannot be refreshed.")
ValueError: Anonymous credentials cannot be refreshed.
I believe the problem is that the bucket.exists() function is not enabled for anonymous clients. From the docs:
Such a client has only limited access to “public” buckets: listing their contents and downloading their blobs.
And I don't get any errors there:
>>> iterator = client.list_blobs(bucket_name)
>>> for i in iterator:
... print(i.name)
...
operatorai-model-store/domains/visual-inspection.json
operatorai-model-store/visual-inspection/2022/03/04/15:01:29/artifacts.tar.gz
operatorai-model-store/visual-inspection/versions/212ec479-f565-4440-aad2-c5f8d2b7d4f1.json
This is also the big difference between the first PR, where I suggested using exists() and the second PR, where I changed the validate function to use exists()
Update: the exists() function does appear to work for bucket names that don't exist:
>>> bucket_name = "a-bucket-that-does-not-exist"
>>> client = storage.Client.create_anonymous_client()
>>> bucket = client.bucket(bucket_name=bucket_name)
>>> bucket.exists()
False
Okay, I think that this PR has the fix (based on the above):
- https://github.com/operatorai/modelstore/pull/176
Comments welcome & thanks for raising this again @ionicsolutions.
In short: I try exists(), if that fails with a ValueError, I try to list_blobs(); if that fails with NotFound then the validation fails.
Just to confirm, this is how it looks for me now!
modelstore-dev-3-8-12 ❯ python
Python 3.8.12 (default, Mar 24 2022, 23:17:02)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from modelstore import ModelStore
>>> model_store = ModelStore.from_gcloud(bucket_name="xai-demo-models")
IPython could not be loaded!
pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
>>> model_store.list_domains()
['visual-inspection']
>>> model_store.list_models("visual-inspection")
['212ec479-f565-4440-aad2-c5f8d2b7d4f1']
>>> model_store.get_model_info("visual-inspection", "212ec479-f565-4440-aad2-c5f8d2b7d4f1")
{'model': {'domain': 'visual-inspection', 'model_id': '212ec479-f565-4440-aad2-c5f8d2b7d4f1', 'model_type': {'library': 'tensorflow', ...
Thanks for solving this issue so quickly! I can confirm that it works with the latest main :-)
✅ This was released as part of modelstore==0.0.75
- https://github.com/operatorai/modelstore/pull/201
- https://pypi.org/project/modelstore/0.0.75/
Let me know if you see any other issues!