Fix CICD Pipeline: Upgrade Ubuntu version on Azure DevOps agent, Go dependency errors, and more
Proposed changes
Fix issues in CICD pipeline causing builds and status checks to fail, blocking new features from being merged.
-
Upgrade Ubuntu agent version to
20.04-
16.04has been decommissioned in October 2021. More info here.
-
-
Fix errors in pipeline when fetching Go dependencies for current linting tools
- Go
1.15and1.16are deprecated, causing newer versions of linting tools to fail when installing. NOTE: Project has been upgraded to1.17as part of resolving these errors.
- Go
-
WIP: Fix issue in PR pipeline where newly leased ephemeral AWS account is not being used when setting up infra for testing. (Current understanding of problem here)
Types of changes
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [x] Refactor (changes to code, which do not change application behavior)
Checklist
- [x] I have filled out this PR template
- [x] I have read the CONTRIBUTING doc
- [ ] I have added automated tests that prove my fix is effective or that my feature works
- [x] I have added necessary documentation (
README.md, inline comments, etc.) - [x] I have updated the
CHANGELOG.mdunder a## nextrelease, with a short summary of my changes
Relevant Links
Further comments
@shubydo Can you complete the Checklist segement? @nathanagood could you do a review for this PR? Thanks,
@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass
@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass
golang lint is throwing an error and Terraform is out of date. Researching both.
@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass
golang lint is throwing an error and Terraform is out of date. Researching both.
@eric-w-hart @nathanagood This is now fixed. It's now pending deployment here. Needs an approval.
Let's get the pipeline passing as part of this PR and do the terraform upgrade in subsequent PRs. Once the pipeline is fixed this will help unblock the pending PRs.
Observations & current understanding of pipeline flow
-
After the initial stage (
TestAndBuild) is run to lint, run tests, and create a build artifact are completed, theDeploystage is started and will lease an account to run terraform and functional tests against. Afterwards the leased account is destroyed. -
After lease is created, we are "logging" into the leased account (similar to assume role) using the
dce logincommand. This writes the new AWS creds to the default path in the users home directory~/.aws/credentials. At this point, the agent should now be authenticated using theDCEPrincipal-xxxrole tied to that leased account, but is not. However when running the same dce CLI commands locally, it is working as expected.-
Current suspicion on why the remaining steps may be failing: path on agent where creds are stored The creds created by the
dce loginCLI command are written to the default location that the AWS cli looks for:~/.aws/credentialsor$HOME/.aws/credentials, but the location of~on the build agent may have a different path than expected, so it is unable to resolve them in subsequent steps?- The next step in the pipeline is to run a script to create a S3 bucket for the tf backend in the leased account, but fails because it's not authenticated as the
DCEPrincipal-xxxwhich has permissions to perform the needed S3 actions.
- The next step in the pipeline is to run a script to create a S3 bucket for the tf backend in the leased account, but fails because it's not authenticated as the
-
Current suspicion on why the remaining steps may be failing: path on agent where creds are stored The creds created by the
TL;DR: The credentials that the dce login command creates and writes to a file, don't seem to be used for subsequent aws CLI commands, causing errors in the Configure Terraform Backend step.
Not sure if this is what the root of the current issue is, but this is where I’m stuck. Any help on looking into this would be greatly appreciated.
Edit: A couple other pieces to look into is the path where the AWS Shell Script task that is used to initially authenticate with AWS writes credentials, vs the dce login command, and then bash step using aws commands afterwards is pulling creds from.
Update (07/09/2022):
Current issue/failure now is in Deploy stage here.
Update:
Current issue/failure now is in
Deploystage here.Observations & current understanding of pipeline flow
After the initial stage (
TestAndBuild) is run to lint, run tests, and create a build artifact are completed, theDeploystage is started and will lease an account to run terraform and functional tests against. Afterwards the leased account is destroyed.After lease is created, we are "logging" into the leased account (similar to assume role) using the
dce logincommand. This writes the new AWS creds to the default path in the users home directory~/.aws/credentials. At this point, the agent should now be authenticated using theDCEPrincipal-xxxrole tied to that leased account, but is not. However when running the same dce CLI commands locally, it is working as expected.
** Current suspicion on why the remaining steps may be failing: ** The creds created by the
dce loginCLI command are written to the default location that the AWS cli looks for:~/.aws/credentialsor$HOME/.aws/credentials, but the location of~on the build agent may have a different path than expected, so it is unable to resolve them in subsequent steps?
- The next step in the pipeline is to run a script to create a S3 bucket for the tf backend in the leased account, but fails because it's not authenticated as the
DCEPrincipal-xxxwhich has permissions to perform the needed S3 actions.Not sure if this is what the root of the current issue is, but this is where I’m stuck. Any help on looking into this would be greatly appreciated
I'll look further into this over the weekend. I have some same AWS scripts from a class I took that may help.