Cluster agent cannot update the `datadogtoken` ConfigMap
Describe what happened:
I deployed Datadog Operator Helm chart version 0.8.5 and DatadogAgent. The cluster agent doesn't send kubernetes_state (or kuberenetes_state_core) metrics to Datadog. When I run agent status on the cluster agent, the kubernetes_apiserver check has an error shown below.
Output of the cluster agent's status:
===============================
Datadog Cluster Agent (v1.22.0)
===============================
Status date: 2022-08-03 06:01:08.884 UTC (1659506468884)
Agent start: 2022-08-03 05:40:47.19 UTC (1659505247190)
Pid: 1
Go Version: go1.17.11
Build arch: amd64
Agent flavor: cluster_agent
Check Runners: 8
Log Level: INFO
...
Leader Election
===============
Leader Election Status: Running
Leader Name is: datadog-agent-cluster-agent-7bdd857659-4jdr4
Last Acquisition of the lease: Wed, 03 Aug 2022 05:41:57 UTC
Renewed leadership: Wed, 03 Aug 2022 06:01:00 UTC
Number of leader transitions: 6 transitions
...
=========
Collector
=========
Running Checks
==============
...
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [WARNING]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
Total Runs: 82
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 3, Total: 91
Service Checks: Last Run: 3, Total: 231
Average Execution Time : 1.995s
Last Execution Date : 2022-08-03 06:01:06 UTC (1659506466000)
Last Successful Execution Date : 2022-08-03 06:01:06 UTC (1659506466000)
Error: configmaps "datadogtoken" is forbidden: User "system:serviceaccount:datadog:datadog-agent-cluster-agent" cannot update resource "configmaps" in API group "" in the namespace "datadog": Azure does not have opinion for this user.
No traceback
kubernetes_state_core
---------------------
Instance ID: kubernetes_state_core:476f5ca87ed00165 [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_state_core.d/kubernetes_state_core.yaml.default
Total Runs: 81
Metric Samples: Last Run: 5,754, Total: 437,390
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 27, Total: 2,052
Average Execution Time : 41ms
Last Execution Date : 2022-08-03 06:00:56 UTC (1659506456000)
Last Successful Execution Date : 2022-08-03 06:00:56 UTC (1659506456000)
...
The clusterrole created for the cluster agent doesn't seem to have the update verb for the datadogtoken configmap.
Service Account:
$ kubectl get sa datadog-agent-cluster-agent -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/instance: datadog-agent
app.kubernetes.io/managed-by: datadog-operator
app.kubernetes.io/name: datadog-agent-deployment
app.kubernetes.io/part-of: datadog-datadog--agent
app.kubernetes.io/version: ""
name: datadog-agent-cluster-agent
ownerReferences:
- apiVersion: datadoghq.com/v1alpha1
blockOwnerDeletion: true
controller: true
kind: DatadogAgent
name: datadog-agent
secrets:
- name: datadog-agent-cluster-agent-token-pdwxm
Cluster Role:
$ kubectl get clusterrole datadog-agent-cluster-agent -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/instance: datadog-datadog--agent
app.kubernetes.io/managed-by: datadog-operator
app.kubernetes.io/name: datadog-agent-deployment
app.kubernetes.io/part-of: datadog-datadog--agent
app.kubernetes.io/version: ""
name: datadog-agent-cluster-agent
rules:
- apiGroups:
- ""
resources:
- services
- events
- endpoints
- pods
- nodes
- componentstatuses
- configmaps
- namespaces
verbs:
- get
- list
- watch
- apiGroups:
- quota.openshift.io
resources:
- clusterresourcequotas
verbs:
- get
- list
- nonResourceURLs:
- /version
- /healthz
verbs:
- get
- apiGroups:
- ""
resourceNames:
- datadog-leader-election
resources:
- configmaps
verbs:
- get
- update
- apiGroups:
- ""
resources:
- configmaps
verbs:
- create
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- ""
resourceNames:
- kube-system
resources:
- namespaces
verbs:
- get
- apiGroups:
- ""
resourceNames:
- datadog-custom-metrics
resources:
- configmaps
verbs:
- get
- update
- apiGroups:
- ""
resourceNames:
- extension-apiserver-authentication
resources:
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- get
- apiGroups:
- ""
resources:
- events
verbs:
- create
- apiGroups:
- datadoghq.com
resources:
- datadogmetrics
verbs:
- list
- watch
- create
- delete
- apiGroups:
- datadoghq.com
resources:
- datadogmetrics/status
verbs:
- update
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
verbs:
- get
- list
- watch
- create
- update
- apiGroups:
- ""
resources:
- secrets
verbs:
- get
- list
- watch
- create
- update
- apiGroups:
- datadoghq.com
resources:
- extendeddaemonsetreplicasets
verbs:
- get
- apiGroups:
- apps
resources:
- deployments
- replicasets
- statefulsets
- daemonsets
verbs:
- get
- apiGroups:
- batch
resources:
- jobs
verbs:
- list
- watch
- get
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- list
- watch
- get
- apiGroups:
- ""
resources:
- serviceaccounts
- namespaces
verbs:
- list
- apiGroups:
- policy
resources:
- podsecuritypolicies
verbs:
- list
- get
- list
- watch
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterrolebindings
- rolebindings
verbs:
- list
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- list
- apiGroups:
- ""
resourceNames:
- kube-system
resources:
- namespaces
verbs:
- get
- apiGroups:
- ""
resourceNames:
- datadog-cluster-id
resources:
- configmaps
verbs:
- get
- create
- update
- apiGroups:
- apps
resources:
- deployments
- replicasets
- daemonsets
- statefulsets
verbs:
- get
- list
- watch
Cluster Role Binding:
$ kubectl get clusterrolebinding datadog-agent-cluster-agent -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/instance: datadog-datadog--agent
app.kubernetes.io/managed-by: datadog-operator
app.kubernetes.io/name: datadog-agent-deployment
app.kubernetes.io/part-of: datadog-datadog--agent
app.kubernetes.io/version: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: datadog-agent-cluster-agent
subjects:
- kind: ServiceAccount
name: datadog-agent-cluster-agent
The issue can be fixed by assigning an additional role with enough permissions to the cluster agent's service account, e.g. kubectl create clusterrolebinding datadog-cluster-agent-admin --clusterrole=cluster-admin --serviceaccount=datadog:datadog-agent-cluster-agent.
Describe what you expected:
I expected to see kubernetes_state or kuberenetes_state_core metrics in my Datadog Metrics Explorer.
Steps to reproduce the issue: Deploy Datadog Operator helm chart:
helm repo add datadog https://helm.datadoghq.com
helm install datadog datadog/datadog-operator
Create a Kubernetes secret with your API and APP keys:
kubectl create secret generic datadog-secret --from-literal api-key=<DATADOG_API_KEY> --from-literal app-key=<DATADOG_APP_KEY>
Deploy DatadogAgent:
# Source: datadog-agent/templates/datadog-agent.yaml
apiVersion: datadoghq.com/v1alpha1
kind: DatadogAgent
metadata:
name: datadog-agent
spec:
credentials:
apiSecret:
secretName: datadog-secret
keyName: api-key
appSecret:
secretName: datadog-secret
keyName: app-key
agent:
image:
name: gcr.io/datadoghq/agent:7.38.0
config:
volumes:
- name: k8s-certs
hostPath:
path: /etc/kubernetes/certs
type: ''
volumeMounts:
- name: k8s-certs
readOnly: true
mountPath: /etc/kubernetes/certs
kubelet:
hostCAPath: /etc/kubernetes/certs/kubeletserver.crt
criSocket:
criSocketPath: /var/run/containerd/containerd.sock
env:
- name: DD_KUBELET_CLIENT_CA
value: "/etc/kubernetes/certs/kubeletserver.crt"
- name: DD_KUBELET_TLS_VERIFY
value: "false"
- name: DD_DOGSTATSD_PORT
value: "8125"
- name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
value: "true"
- name: DD_SITE
value: "datadoghq.com"
- name: DD_CONTAINER_EXCLUDE_LOGS
value: "kube_namespace:.*"
tolerations:
- operator: Exists
apm:
enabled: true
hostPort: 8126
env:
- name: DD_APM_NON_LOCAL_TRAFFIC
value: "true"
- name: DD_APM_RECEIVER_PORT
value: "8126"
process:
enabled: true
processCollectionEnabled: true
log:
enabled: true
logsConfigContainerCollectAll: true
systemProbe:
bpfDebugEnabled: true
security:
compliance:
enabled: true
runtime:
enabled: false
clusterAgent:
replicas: 2
image:
name: gcr.io/datadoghq/cluster-agent:1.22.0
config:
externalMetrics:
enabled: true
useDatadogMetrics: true
clusterChecksEnabled: true
admissionController:
enabled: true
env:
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_SITE
value: "datadoghq.com"
features:
prometheusScrape:
enabled: true
kubeStateMetricsCore:
enabled: true
Additional environment details (Operating System, Cloud provider, etc): Cloud: Azure Platform: AKS