gcsfs icon indicating copy to clipboard operation
gcsfs copied to clipboard

ClientConnectorCertificateError on GET request to any blob

Open v-hunt opened this issue 5 years ago • 8 comments

What happened: We are trying to read file(s) from Google storage bucket, but it is not possible

What you expected to happen: We can run any gcsfs API command

Minimal Complete Verifiable Example: Please, note that this is a minimal example. For instance, if we run any other command (e.g. the code for opening a file), it will cause the same error.

import gcsfs
fs = gcsfs.GCSFileSystem(project='my-project')
fs.ls('my-bucket')

This code will cause an exception. Error traceback:

Traceback (most recent call last):
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 936, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore  # noqa
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 981, in create_connection
    ssl_handshake_timeout=ssl_handshake_timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 1009, in _create_connection_transport
    await waiter
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 530, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/sslproto.py", line 189, in feed_ssldata
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 774, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/path/to/my-project/python3.7/site-packages/IPython/core/interactiveshell.py", line 3417, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-daebfc8e4d60>", line 1, in <module>
    fs.ls('bduk-dev-tmt')
  File "/path/to/my-project/python3.7/site-packages/fsspec/asyn.py", line 121, in wrapper
    return maybe_sync(func, self, *args, **kwargs)
  File "/path/to/my-project/python3.7/site-packages/fsspec/asyn.py", line 100, in maybe_sync
    return sync(loop, func, *args, **kwargs)
  File "/path/to/my-project/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise exc.with_traceback(tb)
  File "/path/to/my-project/python3.7/site-packages/fsspec/asyn.py", line 55, in f
    result[0] = await future
  File "/path/to/my-project/python3.7/site-packages/gcsfs/core.py", line 808, in _ls
    out = await self._list_objects(path)
  File "/path/to/my-project/python3.7/site-packages/gcsfs/core.py", line 598, in _list_objects
    items, prefixes = await self._do_list_objects(path)
  File "/path/to/my-project/python3.7/site-packages/gcsfs/core.py", line 633, in _do_list_objects
    json_out=True,
  File "/path/to/my-project/python3.7/site-packages/gcsfs/core.py", line 494, in _call
    timeout=self.requests_timeout,
  File "/path/to/my-project/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/path/to/my-project/python3.7/site-packages/aiohttp/client.py", line 483, in _request
    timeout=real_timeout
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 523, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 859, in _create_connection
    req, traces, timeout)
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 1004, in _create_direct_connection
    raise last_exc
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 986, in _create_direct_connection
    req=req, client_error=client_error)
  File "/path/to/my-project/python3.7/site-packages/aiohttp/connector.py", line 939, in _wrap_create_connection
    req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host www.googleapis.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1076)')]

Anything else we need to know?:

It looks like this issue can be caused by this. But recompiling Python is not a handy solution. There is should be simplier solution or fix

This issue makes pandas.read_excel and pandas.read_csv command failed, what makes this issue more painful

Environment:

  • Dask version: We don't use Dask, gcsfs version is 0.7.1
  • Python version: Python 3.7.4
  • Operating System: MacOS Catalina 10.15.7
  • Install method (conda, pip, source): pip

pip freeze:

aiohttp==3.6.2
appnope==0.1.0
argon2-cffi==20.1.0
async-generator==1.10
async-timeout==3.0.1
attrs==19.3.0
backcall==0.2.0
bleach==3.2.1
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
Flask==1.1.2
fsspec==0.8.4
gcsfs==0.7.1
google-api-core==1.21.0
google-api-python-client==1.10.0
google-auth==1.19.2
google-auth-httplib2==0.0.4
google-auth-oauthlib==0.4.1
google-cloud-core==1.4.3
google-cloud-pubsub==1.7.0
google-cloud-storage==1.31.0
google-cloud-trace==0.23.0
google-crc32c==1.0.0
google-resumable-media==1.1.0
googleapis-common-protos==1.52.0
grpc-google-iam-v1==0.12.3
grpcio==1.30.0
httplib2==0.18.1
idna==2.9
importlib-metadata==2.0.0
ipykernel==5.3.4
ipython==7.18.1
ipython-genutils==0.2.0
itsdangerous==1.1.0
jedi==0.17.2
Jinja2==2.11.2
jsonschema==3.2.0
jupyter-client==6.1.7
jupyter-core==4.6.3
jupyterlab-pygments==0.1.2
MarkupSafe==1.1.1
mistune==0.8.4
multidict==4.7.6
nbclient==0.5.0
nbconvert==6.0.7
nbformat==5.0.8
nest-asyncio==1.4.1
notebook==6.1.4
numpy==1.19.2
oauthlib==3.1.0
opencensus==0.7.9
opencensus-context==0.1.1
packaging==20.4
pandas==1.1.2
pandocfilters==1.4.2
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.8.0
prompt-toolkit==3.0.8
protobuf==3.12.2
ptyprocess==0.6.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
Pygments==2.7.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2020.1
PyYAML==5.3.1
pyzmq==19.0.2
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
Send2Trash==1.5.0
six==1.15.0
terminado==0.9.1
testpath==0.4.4
tornado==6.0.4
traitlets==5.0.4
typing-extensions==3.7.4.3
uritemplate==3.0.1
urllib3==1.25.9
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
wrapt==1.12.1
xlrd==1.2.0
yarl==1.5.1
zipp==3.3.0

v-hunt avatar Oct 19 '20 19:10 v-hunt

Do you succeed with other calls, such as connecting and listing a bucket? Do google's own python APIs work for you?

To me, an SSL error suggests that you may be behind some complex firewall or proxy. It seems unlikely to me that GCS requires some special weak cypher to be compiled into python - other people are connecting just fine.

martindurant avatar Oct 19 '20 20:10 martindurant

Hi @martindurant,

| Do you succeed with other calls, such as connecting and listing a bucket?

I used fs.ls() call in this example. I also tried to read the file blob with fs.open() and I got the same error. So I'm pretty sure this is a common error for any HTTP call.

| Do google's own python APIs work for you?

As we can't read spreadsheets within pandas directly due to this issue, we successfully read them manually by Google official google.cloud.storage module with further passing them as BytesIO object to pandas. IOW we can read any GSC file without issues. So it doesn't look like some common gateway issue.

v-hunt avatar Oct 22 '20 15:10 v-hunt

Perhaps with a combination of pdb and logging you can figure out exactly what call the google API is making, and then, why the gcsfs via aiohttp is different. This error is coming from pretty deep within python.

Note that I don't see cryptography or pyopenssl (or any ssl) in your installed packages.

Please also check any environment variables or configuration you might have relating to certificate trust stores.

martindurant avatar Oct 22 '20 15:10 martindurant

(ping)

martindurant avatar Nov 11 '20 21:11 martindurant

Hi @martindurant Sorry, have been pretty busy so far. I'm going to go with a debugger and update you. For now, just let me share some thoughts:

  • If I don't use async mode, it should not use asyncio on my opinion.
  • All HTPP related libs work without any third-paty SSL and/or encryption libs.

v-hunt avatar Nov 25 '20 17:11 v-hunt

If I don't use async mode, it should not use asyncio

To have the distinction would mean writing two separate implementation with double the code. Even if you don't use asyncio directly, you might still appreciate the concurrent bulk operations it provides you.

martindurant avatar Nov 25 '20 17:11 martindurant

https://github.com/aio-libs/aiohttp/issues/5375#issuecomment-791034670 solved the problem for me.

lorabit110 avatar Sep 13 '23 20:09 lorabit110

@lorabit110 , do you know which version that is released in?

martindurant avatar Sep 13 '23 21:09 martindurant