gatekeeper icon indicating copy to clipboard operation
gatekeeper copied to clipboard

Reconciler error in Gatekeeper controller pod

Open akchamps opened this issue 4 years ago • 6 comments

What steps did you take and what happened: [A clear and concise description of what the bug is.] Deployed gatekeeper and created a constraint template. In Gatekeeper controller pod logs, saw errors after creating constraint template. kubectl -n gatekeeper-system -l control-plane=controller-manager logs -f

1", "kind": "AllowedImageRepos", "name": "blossom-allowed-repos"}} 2021-10-22T11:01:29.493Z error controller Reconciler error {"controller": "constraint-controller", "name": "gvk:AllowedImageRepos.v1beta1.constraints.gatekeeper.sh:blossom-allowed-repos", "namespace": "", "error": "Constraint kind AllowedImageRepos is not recognized"} github.com/go-logr/zapr.(*zapLogger).Error /go/src/github.com/open-policy-agent/gatekeeper/vendor/github.com/go-logr/zapr/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /go/src/github.com/open-policy-agent/gatekeeper/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /go/src/github.com/open-policy-agent/gatekeeper/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /go/src/github.com/open-policy-agent/gatekeeper/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /go/src/github.com/open-policy-agent/gatekeeper/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90

What did you expect to happen: There should have been no error.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment: Testing

  • Gatekeeper version: 3.3.0
  • Kubernetes version: (use kubectl version): 1.16

akchamps avatar Oct 22 '21 11:10 akchamps

It looks like this release is a bit old.

If you update to a more recent release, are you still seeing errors?

maxsmythe avatar Oct 26 '21 01:10 maxsmythe

This was likely fixed by:

https://github.com/open-policy-agent/gatekeeper/pull/1240

maxsmythe avatar Oct 26 '21 01:10 maxsmythe

@maxsmythe Hi. Firstly, thank you for maintain gatekeeper. We use image: openpolicyagent/gatekeeper:v3.7.0, but following error raised. Is this already solved in the new release?

{
  "level": "error",
  "ts": 1641973648.2500007,
  "logger": "controller-runtime.manager.controller.constraint-controller",
  "msg": "Reconciler error",
  "name": "gvk:RequireFieldKeyConstraint.v1beta1.constraints.gatekeeper.sh:require-template-labels",
  "namespace": "",
  "error": "Constraint kind RequireFieldKeyConstraint is not recognized",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/open-policy-agent/gatekeeper/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"
}

logs and yamls are put on https://gist.github.com/krrrr38/721f1109947a07e06d68e0680fab3c12

I'm not sure after this raised, gatekeeper-audit become CrashLoopBackoff.

Thanks.

krrrr38 avatar Jan 12 '22 07:01 krrrr38

Firstly, thank you for maintain gatekeeper.

Thank you for the thank you!

We use image: openpolicyagent/gatekeeper:v3.7.0, but following error raised. Is this already solved in the new release?

There was a release (prior to 3.7.0) where this error was more common, but that was fixed, so this log line should be rarer now. I'm not sure what would cause it in this instance, but there are a few possible race conditions. Operationally, this line is harmless, though if it isn't transient it's worth looking into.

Because the pod is crashlooping, this might be an indirect symptom... if the pod writes out a status resource and then the container within the pod restarts, we could be trying to sync the constraint before the constraint template has been ingested (harmless, but will raise that error until the constraint template has been ingested.

I'm not sure after this raised, gatekeeper-audit become CrashLoopBackoff.

This seems to be your biggest concern, right? Unfortunately there is no information on what's causing the crash. My guess would be OOMing... does the status of the pod state what caused the crash loop backoff? Can you paste a copy of the pod resource that includes its status?

If it is OOMing, try increasing the memory allocated to the audit pod. You might also try setting the audit chunk size to lower audit's memory requirements:

https://open-policy-agent.github.io/gatekeeper/website/docs/audit#configuring-audit

maxsmythe avatar Jan 13 '22 03:01 maxsmythe

Thank you for quick reply. As you told, the pod was killed by OOM. I will try to increase resources and give it some time. If raised. Thanks 🤗


UPDATE: When added resources.limit.memory 2Gi, it become recovered. In my case, gatekeeper-audit pod uses 926Mi memory.

krrrr38 avatar Jan 13 '22 06:01 krrrr38

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 23 '22 02:07 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 11 '22 05:10 stale[bot]