integrations-core icon indicating copy to clipboard operation
integrations-core copied to clipboard

Add Telemetry gatherer for database integrations

Open sethsamuel opened this issue 1 year ago • 1 comments

What does this PR do?

Adds a new Telemetry gatherer for database integrations. This component emits a debounced set of events once per minute to the dbm-metrics endpoint. See this PR for handling of the events.

Motivation

We want to gather (very coarse) cross-org metrics on our integration performance. These metrics are kept intentionally low in cardinality and frequency, but should be sufficient to flag problems with agents collecting too many rows or taking too long to gather information.

Additional Notes

These events are very small and emitted far less frequently than the database integration events they are monitoring. Performance impact should be immeasurably low, but there is a hidden config option (enable_telemetry) that can be set to false if there are unexpected problems.

Review checklist (to be filled by reviewers)

  • [ ] Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • [ ] Changelog entries must be created for modifications to shipped code
  • [ ] Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • [ ] If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

sethsamuel avatar Jun 13 '24 15:06 sethsamuel

Codecov Report

Attention: Patch coverage is 77.94118% with 15 lines in your changes missing coverage. Please review.

Project coverage is 89.22%. Comparing base (b9cef75) to head (5eeef3e). Report is 384 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
active_directory 100.00% <ø> (+27.27%) :arrow_up:
activemq 52.80% <ø> (ø)
activemq_xml 82.31% <ø> (ø)
airflow 92.20% <ø> (?)
amazon_msk 88.91% <ø> (ø)
ambari 85.80% <ø> (ø)
apache 95.08% <ø> (ø)
arangodb 98.23% <ø> (ø)
argo_rollouts 90.00% <ø> (ø)
argo_workflows 87.87% <ø> (ø)
argocd 87.81% <ø> (ø)
aspdotnet 100.00% <ø> (ø)
avi_vantage 91.35% <ø> (ø)
azure_iot_edge 82.08% <ø> (ø)
boundary 100.00% <ø> (ø)
btrfs 82.91% <ø> (ø)
cacti 87.90% <ø> (ø)
calico 84.61% <ø> (ø)
cassandra 66.66% <ø> (ø)
cert_manager 77.41% <ø> (ø)
cilium 78.20% <ø> (?)
cisco_aci 95.31% <ø> (ø)
citrix_hypervisor 87.50% <ø> (ø)
cloud_foundry_api 96.11% <ø> (ø)
cloudera 99.51% <ø> (ø)
cockroachdb 93.19% <ø> (ø)
consul 91.82% <ø> (ø)
coredns 94.61% <ø> (ø)
couch 94.74% <ø> (ø)
crio 89.79% <ø> (ø)
datadog_checks_base 89.60% <77.94%> (+0.78%) :arrow_up:
datadog_checks_dev 77.38% <ø> (+0.07%) :arrow_up:
datadog_checks_downloader 81.37% <ø> (ø)
datadog_cluster_agent 90.19% <ø> (ø)
dcgm 92.10% <ø> (ø)
ddev 87.95% <ø> (ø)
directory 95.68% <ø> (+0.64%) :arrow_up:
disk 89.34% <ø> (ø)
dns_check 93.33% <ø> (ø)
druid 97.70% <ø> (ø)
ecs_fargate 83.52% <ø> (ø)
eks_fargate 94.05% <ø> (ø)
envoy 92.78% <ø> (-2.12%) :arrow_down:
esxi 92.89% <ø> (ø)
etcd 95.56% <ø> (ø)
external_dns 89.28% <ø> (ø)
fluentd 84.32% <ø> (ø)
fluxcd 88.31% <ø> (ø)
foundationdb 83.83% <ø> (ø)
gearmand ?
gitlab_runner 92.10% <ø> (ø)
go_expvar 92.73% <ø> (ø)
gunicorn 92.83% <ø> (+0.75%) :arrow_up:
harbor ?
hazelcast 92.39% <ø> (ø)
hdfs_datanode 89.74% <ø> (ø)
hdfs_namenode 86.72% <ø> (ø)
hive 51.42% <ø> (ø)
hivemq 61.90% <ø> (ø)
http_check 95.32% <ø> (+2.02%) :arrow_up:
hudi 73.91% <ø> (ø)
ibm_ace 92.25% <ø> (?)
ibm_db2 86.87% <ø> (ø)
ibm_i 81.91% <ø> (ø)
ibm_mq 91.28% <ø> (ø)
ibm_was ?
ignite 46.66% <ø> (ø)
impala 97.97% <ø> (ø)
istio 78.14% <ø> (+0.51%) :arrow_up:
jboss_wildfly 47.36% <ø> (ø)
kafka 64.70% <ø> (ø)
karpenter 94.36% <ø> (ø)
kong 87.62% <ø> (ø)
kube_apiserver_metrics 97.74% <ø> (ø)
kube_controller_manager 97.89% <ø> (ø)
kube_dns 95.97% <ø> (ø)
kube_metrics_server 94.87% <ø> (ø)
kube_proxy 96.80% <ø> (ø)
kube_scheduler 97.92% <ø> (ø)
kubelet 91.01% <ø> (ø)
kubernetes_cluster_autoscaler 93.22% <ø> (ø)
kubernetes_state 89.50% <ø> (ø)
kyototycoon 85.96% <ø> (ø)
lighttpd 83.64% <ø> (ø)
linkerd 85.22% <ø> (+1.13%) :arrow_up:
linux_proc_extras 96.22% <ø> (ø)
mapr 82.42% <ø> (ø)
mapreduce 82.08% <ø> (ø)
marathon 83.12% <ø> (ø)
mcache 93.50% <ø> (ø)
mesos_master 89.81% <ø> (ø)
mesos_slave 93.31% <ø> (ø)
mysql ?
nagios 89.01% <ø> (ø)
network 93.64% <ø> (+1.08%) :arrow_up:
nfsstat 95.20% <ø> (ø)
nginx 95.07% <ø> (+0.53%) :arrow_up:
nginx_ingress_controller 98.36% <ø> (ø)
nvidia_triton 88.52% <ø> (ø)
openldap 96.33% <ø> (ø)
openmetrics 98.08% <ø> (ø)
openstack 55.19% <ø> (ø)
openstack_controller 94.41% <ø> (?)
pgbouncer 91.35% <ø> (ø)
php_fpm 90.53% <ø> (+0.82%) :arrow_up:
postfix 88.10% <ø> (ø)
postgres 84.74% <ø> (+7.88%) :arrow_up:
powerdns_recursor 96.65% <ø> (ø)
presto 59.09% <ø> (ø)
process 85.28% <ø> (+0.28%) :arrow_up:
prometheus 94.17% <ø> (ø)
proxysql 98.97% <ø> (ø)
pulsar 100.00% <ø> (ø)
rabbitmq 95.37% <ø> (ø)
ray 96.45% <ø> (ø)
redisdb 88.07% <ø> (ø)
rethinkdb 97.93% <ø> (ø)
riak 99.21% <ø> (ø)
riakcs 87.71% <ø> (ø)
silk 93.82% <ø> (ø)
singlestore 90.81% <ø> (ø)
snowflake 96.27% <ø> (ø)
solr 56.25% <ø> (ø)
spark 94.14% <ø> (+0.27%) :arrow_up:
sqlserver ?
squid 100.00% <ø> (ø)
statsd 87.36% <ø> (ø)
strimzi 89.78% <ø> (ø)
supervisord 89.78% <ø> (ø)
system_core 92.66% <ø> (ø)
system_swap 98.30% <ø> (ø)
tcp_check 91.58% <ø> (ø)
teamcity 88.57% <ø> (+3.17%) :arrow_up:
tekton 82.30% <ø> (ø)
teleport 99.61% <ø> (ø)
temporal 100.00% <ø> (ø)
teradata 94.05% <ø> (ø)
tls 92.02% <ø> (+0.86%) :arrow_up:
tokumx 57.52% <ø> (ø)
tomcat 60.41% <ø> (?)
torchserve 97.32% <ø> (ø)
traefik_mesh 76.75% <ø> (ø)
traffic_server 96.13% <ø> (ø)
twemproxy 79.56% <ø> (ø)
twistlock 80.47% <ø> (ø)
varnish 84.39% <ø> (+0.26%) :arrow_up:
voltdb ?
vsphere ?
weaviate 76.27% <ø> (ø)
win32_event_log 82.67% <ø> (+1.11%) :arrow_up:
wmi_check 97.50% <ø> (ø)
yarn 89.52% <ø> (ø)
zk ?

Flags with carried forward coverage won't be shown. Click here to find out more.

codecov[bot] avatar Jun 13 '24 15:06 codecov[bot]

We're going with the official agent telemetry solution here

sethsamuel avatar Aug 28 '24 14:08 sethsamuel