toil icon indicating copy to clipboard operation
toil copied to clipboard

Google Cloud: creating a cluster fails, as the toil_leader has incorrect hostname

Open danieldanciu opened this issue 6 years ago • 1 comments

Creating a toil cluster on google cloud with a command like: toil launch-cluster <cluster_name>--provisioner gce --leaderNodeType n1-standard-1 --keyPairName <username> --zone europe-west6-c

fails because the docker instance on the toil_leader machine can't start. The reason seems to be that the value in /etc/hostname is: l.europe-west6-c.c.<cluster_name>

and this can't be resolved by the DNS service. I don't understand how it was possible to create an instance with an invalid hostname in the first place. One can work around this particular problem by passing an explicit --hostname parameter to the docker run command (see https://github.com/danieldanciu/toil/compare/master...danieldanciu:dd/gce), but the startup script will fail later when mesos also tries to resolve the name of the current host. So the correct approach would be to fix the invalid hostname on the instance, but I have no idea how to do that. If I create a Google Cloud instance manually or via the REST API the hostname in /etc/hostname doesn't have the 'europe-west6-c.c.<cluster_name>' suffix and can be correctly resolved by the DNS server. Similarly, appending a '.internal' to the long hostname also fixes the issue.

┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-480

danieldanciu avatar Jan 08 '20 20:01 danieldanciu

I don't have much of a clue as to why the hostnames on the provisioned machines end up looking like that.

Do you see the same thing in e.g. us-west1-a? Maybe this is somehow region-specific?

It looks like we're using libcloud's create_node to talk to Google, but I don't see anything in the libcloud docs or our usage of the library that suggests we are asking for the hostname to be anything special.

When you look at the created instance in Google's web UI, as described in https://cloud.google.com/compute/docs/instances/custom-hostname-vm#verifying_the_custom_hostname, does it look like it has a custom hostname assigned to it?

adamnovak avatar Jan 22 '20 23:01 adamnovak