datadog-operator icon indicating copy to clipboard operation
datadog-operator copied to clipboard

Cluster agent cannot update the `datadogtoken` ConfigMap

Open lambohamp opened this issue 3 years ago • 0 comments

Describe what happened: I deployed Datadog Operator Helm chart version 0.8.5 and DatadogAgent. The cluster agent doesn't send kubernetes_state (or kuberenetes_state_core) metrics to Datadog. When I run agent status on the cluster agent, the kubernetes_apiserver check has an error shown below.

Output of the cluster agent's status:

===============================
Datadog Cluster Agent (v1.22.0)
===============================

  Status date: 2022-08-03 06:01:08.884 UTC (1659506468884)
  Agent start: 2022-08-03 05:40:47.19 UTC (1659505247190)
  Pid: 1
  Go Version: go1.17.11
  Build arch: amd64
  Agent flavor: cluster_agent
  Check Runners: 8
  Log Level: INFO

 ...

Leader Election
===============
  Leader Election Status:  Running
  Leader Name is: datadog-agent-cluster-agent-7bdd857659-4jdr4
  Last Acquisition of the lease: Wed, 03 Aug 2022 05:41:57 UTC
  Renewed leadership: Wed, 03 Aug 2022 06:01:00 UTC
  Number of leader transitions: 6 transitions

...

=========
Collector
=========

  Running Checks
  ==============
  ...
    
    kubernetes_apiserver
    --------------------
      Instance ID: kubernetes_apiserver [WARNING]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
      Total Runs: 82
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 3, Total: 91
      Service Checks: Last Run: 3, Total: 231
      Average Execution Time : 1.995s
      Last Execution Date : 2022-08-03 06:01:06 UTC (1659506466000)
      Last Successful Execution Date : 2022-08-03 06:01:06 UTC (1659506466000)
      Error: configmaps "datadogtoken" is forbidden: User "system:serviceaccount:datadog:datadog-agent-cluster-agent" cannot update resource "configmaps" in API group "" in the namespace "datadog": Azure does not have opinion for this user.
      No traceback
        
    
    kubernetes_state_core
    ---------------------
      Instance ID: kubernetes_state_core:476f5ca87ed00165 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state_core.d/kubernetes_state_core.yaml.default
      Total Runs: 81
      Metric Samples: Last Run: 5,754, Total: 437,390
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 27, Total: 2,052
      Average Execution Time : 41ms
      Last Execution Date : 2022-08-03 06:00:56 UTC (1659506456000)
      Last Successful Execution Date : 2022-08-03 06:00:56 UTC (1659506456000)
      
 ...

The clusterrole created for the cluster agent doesn't seem to have the update verb for the datadogtoken​ configmap.

Service Account:

$ kubectl get sa datadog-agent-cluster-agent -o yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/instance: datadog-agent
    app.kubernetes.io/managed-by: datadog-operator
    app.kubernetes.io/name: datadog-agent-deployment
    app.kubernetes.io/part-of: datadog-datadog--agent
    app.kubernetes.io/version: ""
  name: datadog-agent-cluster-agent
  ownerReferences:
  - apiVersion: datadoghq.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DatadogAgent
    name: datadog-agent
secrets:
- name: datadog-agent-cluster-agent-token-pdwxm

Cluster Role:

$ kubectl get clusterrole datadog-agent-cluster-agent -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/instance: datadog-datadog--agent
    app.kubernetes.io/managed-by: datadog-operator
    app.kubernetes.io/name: datadog-agent-deployment
    app.kubernetes.io/part-of: datadog-datadog--agent
    app.kubernetes.io/version: ""
  name: datadog-agent-cluster-agent
rules:
- apiGroups:
  - ""
  resources:
  - services
  - events
  - endpoints
  - pods
  - nodes
  - componentstatuses
  - configmaps
  - namespaces
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - quota.openshift.io
  resources:
  - clusterresourcequotas
  verbs:
  - get
  - list
- nonResourceURLs:
  - /version
  - /healthz
  verbs:
  - get
- apiGroups:
  - ""
  resourceNames:
  - datadog-leader-election
  resources:
  - configmaps
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resourceNames:
  - kube-system
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
  - ""
  resourceNames:
  - datadog-custom-metrics
  resources:
  - configmaps
  verbs:
  - get
  - update
- apiGroups:
  - ""
  resourceNames:
  - extension-apiserver-authentication
  resources:
  - configmaps
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
  - get
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
- apiGroups:
  - datadoghq.com
  resources:
  - datadogmetrics
  verbs:
  - list
  - watch
  - create
  - delete
- apiGroups:
  - datadoghq.com
  resources:
  - datadogmetrics/status
  verbs:
  - update
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  verbs:
  - get
  - list
  - watch
  - create
  - update
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
  - list
  - watch
  - create
  - update
- apiGroups:
  - datadoghq.com
  resources:
  - extendeddaemonsetreplicasets
  verbs:
  - get
- apiGroups:
  - apps
  resources:
  - deployments
  - replicasets
  - statefulsets
  - daemonsets
  verbs:
  - get
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - list
  - watch
  - get
- apiGroups:
  - batch
  resources:
  - cronjobs
  verbs:
  - list
  - watch
  - get
- apiGroups:
  - ""
  resources:
  - serviceaccounts
  - namespaces
  verbs:
  - list
- apiGroups:
  - policy
  resources:
  - podsecuritypolicies
  verbs:
  - list
  - get
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - rolebindings
  verbs:
  - list
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  verbs:
  - list
- apiGroups:
  - ""
  resourceNames:
  - kube-system
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
  - ""
  resourceNames:
  - datadog-cluster-id
  resources:
  - configmaps
  verbs:
  - get
  - create
  - update
- apiGroups:
  - apps
  resources:
  - deployments
  - replicasets
  - daemonsets
  - statefulsets
  verbs:
  - get
  - list
  - watch

Cluster Role Binding:

$ kubectl get clusterrolebinding datadog-agent-cluster-agent -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/instance: datadog-datadog--agent
    app.kubernetes.io/managed-by: datadog-operator
    app.kubernetes.io/name: datadog-agent-deployment
    app.kubernetes.io/part-of: datadog-datadog--agent
    app.kubernetes.io/version: ""
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: datadog-agent-cluster-agent
subjects:
- kind: ServiceAccount
  name: datadog-agent-cluster-agent

The issue can be fixed by assigning an additional role with enough permissions to the cluster agent's service account, e.g. kubectl create clusterrolebinding datadog-cluster-agent-admin --clusterrole=cluster-admin --serviceaccount=datadog:datadog-agent-cluster-agent.

Describe what you expected: I expected to see kubernetes_state or kuberenetes_state_core metrics in my Datadog Metrics Explorer.

Steps to reproduce the issue: Deploy Datadog Operator helm chart:

helm repo add datadog https://helm.datadoghq.com
helm install datadog datadog/datadog-operator

Create a Kubernetes secret with your API and APP keys:

kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY> --from-literal app-key=<DATADOG_APP_KEY>

Deploy DatadogAgent:

# Source: datadog-agent/templates/datadog-agent.yaml
apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
  name: datadog-agent
spec:
  credentials:
    apiSecret:
      secretName: datadog-secret
      keyName: api-key
    appSecret:
      secretName: datadog-secret
      keyName: app-key
  agent:
    image:
      name: gcr.io/datadoghq/agent:7.38.0
    config:
      volumes:
        - name: k8s-certs
          hostPath:
            path: /etc/kubernetes/certs
            type: ''
      volumeMounts:
        - name: k8s-certs
          readOnly: true
          mountPath: /etc/kubernetes/certs
      kubelet:
        hostCAPath: /etc/kubernetes/certs/kubeletserver.crt
      criSocket:
        criSocketPath: /var/run/containerd/containerd.sock
      env:
        - name: DD_KUBELET_CLIENT_CA
          value: "/etc/kubernetes/certs/kubeletserver.crt"
        - name: DD_KUBELET_TLS_VERIFY
          value: "false"
        - name: DD_DOGSTATSD_PORT
          value: "8125"
        - name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
          value: "true"
        - name: DD_SITE
          value: "datadoghq.com"
        - name: DD_CONTAINER_EXCLUDE_LOGS
          value: "kube_namespace:.*"
      tolerations:
        - operator: Exists
    apm:
      enabled: true
      hostPort: 8126
      env:
      - name: DD_APM_NON_LOCAL_TRAFFIC
        value: "true"
      - name: DD_APM_RECEIVER_PORT
        value: "8126"
    process:
      enabled: true
      processCollectionEnabled: true
    log:
      enabled: true
      logsConfigContainerCollectAll: true
    systemProbe:
      bpfDebugEnabled: true
    security:
      compliance:
        enabled: true
      runtime:
        enabled: false
  clusterAgent:
    replicas: 2
    image:
      name: gcr.io/datadoghq/cluster-agent:1.22.0
    config:
      externalMetrics:
        enabled: true
        useDatadogMetrics: true
      clusterChecksEnabled: true
      admissionController:
        enabled: true
      env:
        - name: DD_COLLECT_KUBERNETES_EVENTS
          value: "true"
        - name: DD_SITE
          value: "datadoghq.com"
  features:
    prometheusScrape:
      enabled: true
    kubeStateMetricsCore:
      enabled: true

Additional environment details (Operating System, Cloud provider, etc): Cloud: Azure Platform: AKS

lambohamp avatar Aug 03 '22 06:08 lambohamp