control-tower icon indicating copy to clipboard operation
control-tower copied to clipboard

Upgrade to version 0.22.1 caused web worker to fail

Open beccar97 opened this issue 3 years ago • 5 comments

I'm in the process of upgrading our installation of control-tower, which had fallen behind (on v19.1) due to the need for a manual update of our database. I've been going a major version at a time, and yesterday tried to update from version 0.21.0 to version 0.22.1, using the self-update job (with the version pinned). The update ran without issue, and appeared to succeed, but after it completed the web UI for concourse was completely unavailable. Authenticating to our instance and running bosh vms I could see that the web process was failing. I ssh-ed into the web VM to look at the logs, and found that in the web.stderr.log there were several entries of

error: failed to connect to database: x509: certificate relies on legacy Common Name field, use SANs instead

I redeployed the instance with version 0.21.0 of control-tower and that was successful, and the recreated web instance had no issues, so this seems to be an issue specific to 0.22.x . Was anything changed about the database connection in this version?

beccar97 avatar Dec 08 '22 12:12 beccar97

We experienced the same issue, using self-update to upgrade from 0.21.0 to 0.22.0. Our web instance failed and the same x509: certificate relies on legacy Common Name field as shown above appeared in our web.stderr.log. During this time our web UI was inaccessible to users.

Running bosh restart to redeploy succeeded for us our web instance is running successfully for us again now.

Can anyone confirm there was something specific in upgrading to 0.22.0 that caused this?

zhibek avatar Feb 15 '23 12:02 zhibek

We experienced the same problem after attempting to upgrade to control-tower 0.23.0 & 0.24.0.

After digging deeper on this, I think the root cause may be golang version changing and this golang 1.15+ issue being hit: https://github.com/golang/go/issues/39568

We haven't got a fix yet, but hopefully this helps anyone who hits similar problems.

zhibek avatar Feb 15 '23 18:02 zhibek

running into this as well: tried bumping the cert on RDS manually -> web refuses to start up because it doesn't trust that yet. I reckon https://github.com/EngineerBetter/control-tower/blob/master/db/rds_root_cert.go needs to be updated

aranair avatar Nov 20 '23 23:11 aranair

@aranair FYI We solved this issue by temporarily upgrading the RDS CA to something other than rds-ca-2019, then immediately reverting it to rds-ca-2019. This appears to have given us a version of rds-ca-2019 initiated after July 2020, avoiding the problem of pre-July 2020 rds-ca-2019 with golang 1.15+.

zhibek avatar Nov 21 '23 08:11 zhibek

lol wow, thanks for that! I was compiling the control-tower binary from source w the new global bundles but was running into other issues (arg length and keytool not recognizing and whatnot)

aranair avatar Nov 21 '23 15:11 aranair