AzureVMCluster throwing raise FatalCommClosedError() from err distributed.comm.core.FatalCommClosedError
On a new python 3.7 conda environment
$ pip install dask-cloudprovider[azure] $ az login $ python
from dask_cloudprovider.azure import AzureVMCluster
resource_group = "NGC-AML-Quick-Launch"
vnet="NGC-AML-Quick-Launch-vnet"
security_group="NGC-AML-Quick-Launch-nsg"
initial_node_count = 2
vm_name = "Standard_NC6s_v3"
location = "South Central US"
base_dockerfile = "rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04-py3.7"
cluster = AzureVMCluster(
resource_group=resource_group,
location = location,
vnet=vnet,
security_group=security_group,
n_workers=initial_node_count,
vm_size=vm_name,
docker_image=base_dockerfile,
docker_args="--privileged",
worker_class="dask_cuda.CUDAWorker")
Creating scheduler instance
Assigned public IP
Network interface ready
Creating VM
Created VM dask-455260e7-scheduler
Waiting for scheduler to run at 13.84.221.226:8786
Scheduler is running
Traceback (most recent call last):
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\comm\tcp.py", line 363, in connect
ip, port, max_buffer_size=MAX_BUFFER_SIZE, **kwargs
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\tornado\tcpclient.py", line 289, in connect
False, ssl_options=ssl_options, server_hostname=host
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\tornado\iostream.py", line 1391, in _do_ssl_handshake
self.socket.do_handshake()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\ssl.py", line 1139, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1091)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 10, in <module>
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\dask_cloudprovider\azure\azurevm.py", line 496, in __init__
super().__init__(**kwargs)
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\dask_cloudprovider\generic\vmcluster.py", line 284, in __init__
super().__init__(**kwargs, security=self.security)
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\deploy\spec.py", line 281, in __init__
self.sync(self._start)
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\deploy\cluster.py", line 189, in sync
return sync(self.loop, func, *args, **kwargs)
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\utils.py", line 340, in sync
raise exc.with_traceback(tb)
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\utils.py", line 324, in f
result[0] = yield future
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\tornado\gen.py", line 762, in run
value = future.result()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\dask_cloudprovider\generic\vmcluster.py", line 324, in _start
await super()._start()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\deploy\spec.py", line 314, in _start
await super()._start()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\deploy\cluster.py", line 73, in _start
comm = await self.scheduler_comm.live_comm()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\core.py", line 747, in live_comm
**self.connection_args,
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\comm\core.py", line 288, in connect
timeout=min(intermediate_cap, time_left()),
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\asyncio\tasks.py", line 442, in wait_for
return fut.result()
File "C:\Users\mreyesgomez\Anaconda3\envs\AzureVMCluster_3_7__01_28_2021_no_change\lib\site-packages\distributed\comm\tcp.py", line 376, in connect
raise FatalCommClosedError() from err
distributed.comm.core.FatalCommClosedError
- Dask version:
- Python version:3.7
- Operating System:
- Install method (conda, pip, source):pip
Cluster connections are now secure by default.
Looks like you may need to update your local version of openssl.
Alternatively as a workaround try setting the security=False kwarg.
security=False, does not seem to work as I am getting the following:
Waiting for scheduler to run at 52.171.62.23:8786
Scheduler is running
Traceback (most recent call last):
File "
Have you tried updating ssl ?
@quasiben
I am using a pretty recent one
openssl version OpenSSL 1.1.1i 8 Dec 2020
I tried removing argument
docker_args="--privileged",
Same behavior
I ran it from the RAPIDS container, same result
@quasiben Many people had tried now and are getting same error. With recent openssl version 12/2020.
security=False works on a linux machine, would check again if I still get the errors I reported above on a windows machine