dce icon indicating copy to clipboard operation
dce copied to clipboard

Fix CICD Pipeline: Upgrade Ubuntu version on Azure DevOps agent, Go dependency errors, and more

Open shubydo opened this issue 3 years ago • 7 comments

Proposed changes

Fix issues in CICD pipeline causing builds and status checks to fail, blocking new features from being merged.

  • Upgrade Ubuntu agent version to 20.04

    • 16.04 has been decommissioned in October 2021. More info here.
  • Fix errors in pipeline when fetching Go dependencies for current linting tools

    • Go 1.15 and 1.16 are deprecated, causing newer versions of linting tools to fail when installing. NOTE: Project has been upgraded to1.17 as part of resolving these errors.
  • WIP: Fix issue in PR pipeline where newly leased ephemeral AWS account is not being used when setting up infra for testing. (Current understanding of problem here)

Types of changes

  • [x] Bugfix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] Refactor (changes to code, which do not change application behavior)

Checklist

  • [x] I have filled out this PR template
  • [x] I have read the CONTRIBUTING doc
  • [ ] I have added automated tests that prove my fix is effective or that my feature works
  • [x] I have added necessary documentation (README.md, inline comments, etc.)
  • [x] I have updated the CHANGELOG.md under a ## next release, with a short summary of my changes

Relevant Links

Further comments

shubydo avatar May 18 '22 16:05 shubydo

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar May 18 '22 16:05 CLAassistant

@shubydo Can you complete the Checklist segement? @nathanagood could you do a review for this PR? Thanks,

eric-w-hart avatar May 27 '22 09:05 eric-w-hart

@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass

shubydo avatar May 27 '22 13:05 shubydo

@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass

golang lint is throwing an error and Terraform is out of date. Researching both.

eric-w-hart avatar Jun 13 '22 13:06 eric-w-hart

@eric-w-hart @nathanagood pipeline is having some errors when trying to install the linting tools. Still needs to be fixed before the checks will pass

golang lint is throwing an error and Terraform is out of date. Researching both.

@eric-w-hart @nathanagood This is now fixed. It's now pending deployment here. Needs an approval.

Let's get the pipeline passing as part of this PR and do the terraform upgrade in subsequent PRs. Once the pipeline is fixed this will help unblock the pending PRs.

shubydo avatar Jul 01 '22 22:07 shubydo

Observations & current understanding of pipeline flow

  1. After the initial stage (TestAndBuild) is run to lint, run tests, and create a build artifact are completed, the Deploy stage is started and will lease an account to run terraform and functional tests against. Afterwards the leased account is destroyed.

  2. After lease is created, we are "logging" into the leased account (similar to assume role) using the dce login command. This writes the new AWS creds to the default path in the users home directory ~/.aws/credentials. At this point, the agent should now be authenticated using the DCEPrincipal-xxx role tied to that leased account, but is not. However when running the same dce CLI commands locally, it is working as expected.

    • Current suspicion on why the remaining steps may be failing: path on agent where creds are stored The creds created by the dce login CLI command are written to the default location that the AWS cli looks for: ~/.aws/credentials or $HOME/.aws/credentials, but the location of ~ on the build agent may have a different path than expected, so it is unable to resolve them in subsequent steps?
      • The next step in the pipeline is to run a script to create a S3 bucket for the tf backend in the leased account, but fails because it's not authenticated as the DCEPrincipal-xxx which has permissions to perform the needed S3 actions.

TL;DR: The credentials that the dce login command creates and writes to a file, don't seem to be used for subsequent aws CLI commands, causing errors in the Configure Terraform Backend step.

Not sure if this is what the root of the current issue is, but this is where I’m stuck. Any help on looking into this would be greatly appreciated.

Edit: A couple other pieces to look into is the path where the AWS Shell Script task that is used to initially authenticate with AWS writes credentials, vs the dce login command, and then bash step using aws commands afterwards is pulling creds from.

Update (07/09/2022):

Current issue/failure now is in Deploy stage here.

shubydo avatar Jul 09 '22 12:07 shubydo

Update:

Current issue/failure now is in Deploy stage here.

Observations & current understanding of pipeline flow

  1. After the initial stage (TestAndBuild) is run to lint, run tests, and create a build artifact are completed, the Deploy stage is started and will lease an account to run terraform and functional tests against. Afterwards the leased account is destroyed.

  2. After lease is created, we are "logging" into the leased account (similar to assume role) using the dce login command. This writes the new AWS creds to the default path in the users home directory ~/.aws/credentials. At this point, the agent should now be authenticated using the DCEPrincipal-xxx role tied to that leased account, but is not. However when running the same dce CLI commands locally, it is working as expected.

    • ** Current suspicion on why the remaining steps may be failing: ** The creds created by the dce login CLI command are written to the default location that the AWS cli looks for: ~/.aws/credentials or $HOME/.aws/credentials, but the location of ~ on the build agent may have a different path than expected, so it is unable to resolve them in subsequent steps?

      • The next step in the pipeline is to run a script to create a S3 bucket for the tf backend in the leased account, but fails because it's not authenticated as the DCEPrincipal-xxx which has permissions to perform the needed S3 actions.

Not sure if this is what the root of the current issue is, but this is where I’m stuck. Any help on looking into this would be greatly appreciated

I'll look further into this over the weekend. I have some same AWS scripts from a class I took that may help.

eric-w-hart avatar Jul 09 '22 12:07 eric-w-hart