ACME cert generation failed - inconsistent dns
Tried defang cert gen and ran into this:
➜ hasura git:(hasura-rds-example) ✗ COMPOSE_PROJECT_NAME=hasura-rds defang cert gen
# Using AWS provider
Triggering Let's Encrypt cert generation for hasura.hasura-rds-demo-1.goec.dev
Please setup CNAME record for hasura.hasura-rds-demo-1.goec.dev to point to ALB defang-hasura-rds-beta-alb-457580578.us-east-1.elb.amazonaws.com, waiting for CNAME record setup and DNS propagation
Note: DNS propagation may take a while, we will proceed as soon as the CNAME record is ready, checking...
hasura.hasura-rds-demo-1.goec.dev DNS is properly configured!
Triggering cert generation for hasura.hasura-rds-demo-1.goec.dev
Waiting for TLS cert to be online for hasura.hasura-rds-demo-1.goec.dev
Error waiting for TLS to be online: context deadline exceeded
Please check for error messages from `/aws/lambda/acme-lambda` log group in cloudwatch for more details
It looks like results from dns checks are coming back inconsistently, so cert gen initially thinks the DNS is good to go, and when it sends the actual request, it's no longer good. Perhaps something to do with caching at one or more levels...
@raphaeltm do we have the logs from aws/lambda/acme-lambda log group?
Root caused to be timeout waiting for generated cert to be attached to ALB. Going through the lambda log shows:
- ACME workflow has been triggered successfully
- Cert is generated successfully
- Cert Attach API call was successful
- But the lambda function checking the ALB connection has the correct cert timed out after 2min
Current solution: Update the cert attachment timeout to 10min
Is this still an issue? @edwardrf