onepassword-operator icon indicating copy to clipboard operation
onepassword-operator copied to clipboard

Annotations Cause GKE AutoPilot to Enter Create/Destroy Loop

Open kylekurz opened this issue 4 years ago • 3 comments

Your environment

Operator Version: Deployed via Helm Chart 1.7.0 (Connect API 1.5.0)

Connect Server Version: Deployed via Helm Chart 1.7.0 (Connect API 1.5.0)

Kubernetes Version: GKE AutoPilot (1.21.5-gke.1302)

What happened?

A deployment with an annotation being created after the initial Connect deployment will trigger hundreds of pods to start, then terminate. I can never get to a stable point using annotations.

What did you expect to happen?

A new deployment with annotations, or additions/subtractions of annotations to existing deployments should be just as stable as any other deployment.

Steps to reproduce

These are my steps because I find them simplest in that it's only a few commands. It is possible that simpler repro steps exist.

  1. Deploy a GKE AutoPilot Cluster (See attached terraform-gke.zip, replace the project_id placeholder in terraform.tfvars: terraform-gke.zip)
  2. Authenticate kubectl to gcloud: gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region)
  3. Deploy Connect + Operator (See attached terraform-k8s: terraform-k8s.zip)
  4. Create a test deployment: kubectl apply -f nginx.yaml (nginx.zip)
  5. Observe that pods for the nginx deployment will never stabilize. When you eventually decide to delete the deployment, GKE will show dozens of "Ready" pods with no replicas that will eventually be cleaned up.

Notes & Logs

I've attached two zip files and a yaml deployment that can recreate this in a GKE AutoPilot cluster. I'm more than happy to hop on a call and help debug if that's useful. I've not seen this happen on GKE Standard clusters, so I'm not 100% sure what's different in that environment that is causing this to happen.

Also note: I have an open question with a 1Password Solutions Architect, as annotations are also insufficient, even if they were working correctly. If I add an annotation to a deployment, I expect that to inject the created secret into that deployment, but I cannot see how that is happening now. While the secret is created, I have no access to it inside a Terraform-created deployment that references it. This is likely a second bug/feature request, but I wanted to make sure full context was available here.

kylekurz avatar Dec 13 '21 15:12 kylekurz

Apologies, I had a small error in the nginx.yaml file that I attached, it was missing the item-name annotation. The issue still exists after fixing that, but here's the updated deployment: nginx.zip

kylekurz avatar Dec 14 '21 14:12 kylekurz

@jpcoenen any idea when something like this might be addressed? I know you'd previously engaged with me on a couple other tickets and see you're a contributor here too.

kylekurz avatar Dec 29 '21 14:12 kylekurz

Hey @kylekurz,

We currently have a couple of other initiatives ongoing at 1Password at the moment, therefore it's a bit challenging to give an estimate for you. However, we do understand that this is a frustrating problem and we'll come back to you when we've made some further investigation.

edif2008 avatar Jun 08 '22 18:06 edif2008