Separation of tags between metrics logs and traces fix
What does this PR do?
-
_dd.compute_statsis a marker for agent-side trace stats computation. We should isolate_dd.compute_statsto theserverless-inittraces agent. - Deal with the fact that we have
DD_SERVERLESS_INIT_ENABLE_BACKEND_TRACE_STATSandDD_SERVERLESS_INIT_DISABLE_TRACE_STATS - Remove unused BuildTracerTags function for clarity
- De-duplicated use of configured tags
- Renames all tag modification code to be a generalized place for agent tag changes, moved relevant functions into setup function.
- Update comments and tests
Motivation
-
_dd.compute_statsshould not show on logs or as a tag available for runtime metrics. - If
DD_SERVERLESS_INIT_ENABLE_BACKEND_TRACE_STATSis true,_dd.compute_statsshould show on traces. - If
DD_SERVERLESS_INIT_ENABLE_BACKEND_TRACE_STATSis true andDD_SERVERLESS_INIT_DISABLE_TRACE_STATSis false, we should have accurate metric/trace data. - If
DD_SERVERLESS_INIT_ENABLE_BACKEND_TRACE_STATSis true andDD_SERVERLESS_INIT_DISABLE_TRACE_STATSis false, AND sampling is <1.0, we should have inaccurate metric trace data.
- Consistent pattern of tag modification
Describe how you validated your changes
-
[x] Updated existing unit test
-
[x] Test In-Container for
cloudrun -
[x] Test In-Container for
cloudrunjobs -
[x] Test In-Container for
appservice -
[x] Test In-Container for
containerapp -
Using the
lewis/SVLS-4573/serverless-init-test-compute-statsself-monitoring branch -
[x] Test Sidecar for
cloudrunin self-monitoring -
[x] Test Sidecar for
appservicein self-monitoring -
[x] Test Sidecar for
containerappin self-monitoring -
Testing changes to the Azure App Services Extension
-
[x] New unit test
-
[ ] Test in AAS
Static quality checks
✅ Please find below the results from static quality gates Comparison made with ancestor 2441ad422ccb33dadd042c663452f1b07c2af1ed
Successful checks
Info
| Quality gate | Delta | On disk size (MiB) | Delta | On wire size (MiB) | |
|---|---|---|---|---|---|
| ✅ | agent_deb_amd64 | $${0}$$ | $${705.41}$$ < $${707.13}$$ | $${+0.01}$$ | $${172.99}$$ < $${174.16}$$ |
| ✅ | agent_deb_amd64_fips | $${0}$$ | $${701.64}$$ < $${702.86}$$ | $${-0.05}$$ | $${172.21}$$ < $${173.5}$$ |
| ✅ | agent_heroku_amd64 | $${0}$$ | $${327.72}$$ < $${327.92}$$ | $${-0}$$ | $${87.16}$$ < $${88.21}$$ |
| ✅ | agent_msi | $${0}$$ | $${571.12}$$ < $${982.08}$$ | $${-0.03}$$ | $${142.65}$$ < $${143.49}$$ |
| ✅ | agent_rpm_amd64 | $${0}$$ | $${705.4}$$ < $${707.12}$$ | $${-0}$$ | $${176.11}$$ < $${177.29}$$ |
| ✅ | agent_rpm_amd64_fips | $${0}$$ | $${701.62}$$ < $${702.85}$$ | $${+0.03}$$ | $${175.1}$$ < $${176.32}$$ |
| ✅ | agent_rpm_arm64 | $${0}$$ | $${687.27}$$ < $${691.67}$$ | $${+0.04}$$ | $${159.1}$$ < $${160.7}$$ |
| ✅ | agent_rpm_arm64_fips | $${0}$$ | $${684.33}$$ < $${687.54}$$ | $${-0.01}$$ | $${158.77}$$ < $${160.11}$$ |
| ✅ | agent_suse_amd64 | $${0}$$ | $${705.4}$$ < $${707.12}$$ | $${-0}$$ | $${176.11}$$ < $${177.29}$$ |
| ✅ | agent_suse_amd64_fips | $${0}$$ | $${701.62}$$ < $${702.85}$$ | $${+0.03}$$ | $${175.1}$$ < $${176.32}$$ |
| ✅ | agent_suse_arm64 | $${0}$$ | $${687.27}$$ < $${691.67}$$ | $${+0.04}$$ | $${159.1}$$ < $${160.7}$$ |
| ✅ | agent_suse_arm64_fips | $${0}$$ | $${684.33}$$ < $${687.54}$$ | $${-0.01}$$ | $${158.77}$$ < $${160.11}$$ |
| ✅ | docker_agent_amd64 | $${-0}$$ | $${767.2}$$ < $${769.38}$$ | $${-0}$$ | $${260.51}$$ < $${262.06}$$ |
| ✅ | docker_agent_arm64 | $${-0}$$ | $${773.64}$$ < $${778.35}$$ | $${+0.01}$$ | $${249.62}$$ < $${251.72}$$ |
| ✅ | docker_agent_jmx_amd64 | $${+0}$$ | $${958.08}$$ < $${960.26}$$ | $${-0.01}$$ | $${329.14}$$ < $${330.69}$$ |
| ✅ | docker_agent_jmx_arm64 | $${-0}$$ | $${953.24}$$ < $${957.95}$$ | $${+0}$$ | $${314.24}$$ < $${316.35}$$ |
| ✅ | docker_cluster_agent_amd64 | $${-0}$$ | $${180.33}$$ < $${181.08}$$ | $${-0}$$ | $${63.71}$$ < $${64.49}$$ |
| ✅ | docker_cluster_agent_arm64 | $${+0}$$ | $${196.19}$$ < $${198.49}$$ | $${+0}$$ | $${60.0}$$ < $${61.17}$$ |
| ✅ | docker_cws_instrumentation_amd64 | $${-0}$$ | $${7.13}$$ < $${7.18}$$ | $${-0}$$ | $${2.99}$$ < $${3.33}$$ |
| ✅ | docker_cws_instrumentation_arm64 | $${0}$$ | $${6.69}$$ < $${6.92}$$ | $${+0}$$ | $${2.73}$$ < $${3.09}$$ |
| ✅ | docker_dogstatsd_amd64 | $${0}$$ | $${38.76}$$ < $${39.38}$$ | $${-0}$$ | $${15.0}$$ < $${15.82}$$ |
| ✅ | docker_dogstatsd_arm64 | $${0}$$ | $${37.06}$$ < $${37.94}$$ | $${-0}$$ | $${14.33}$$ < $${14.83}$$ |
| ✅ | dogstatsd_deb_amd64 | $${0}$$ | $${29.98}$$ < $${30.61}$$ | $${-0}$$ | $${7.93}$$ < $${8.79}$$ |
| ✅ | dogstatsd_deb_arm64 | $${0}$$ | $${28.14}$$ < $${29.11}$$ | $${-0}$$ | $${6.81}$$ < $${7.71}$$ |
| ✅ | dogstatsd_rpm_amd64 | $${0}$$ | $${29.98}$$ < $${30.61}$$ | $${-0}$$ | $${7.94}$$ < $${8.8}$$ |
| ✅ | dogstatsd_suse_amd64 | $${0}$$ | $${29.98}$$ < $${30.61}$$ | $${-0}$$ | $${7.94}$$ < $${8.8}$$ |
| ✅ | iot_agent_deb_amd64 | $${0}$$ | $${42.92}$$ < $${43.29}$$ | $${+0}$$ | $${11.23}$$ < $${12.04}$$ |
| ✅ | iot_agent_deb_arm64 | $${0}$$ | $${40.05}$$ < $${40.92}$$ | $${-0}$$ | $${9.61}$$ < $${10.45}$$ |
| ✅ | iot_agent_deb_armhf | $${0}$$ | $${40.63}$$ < $${41.03}$$ | $${-0}$$ | $${9.8}$$ < $${10.62}$$ |
| ✅ | iot_agent_rpm_amd64 | $${0}$$ | $${42.92}$$ < $${43.29}$$ | $${-0}$$ | $${11.25}$$ < $${12.06}$$ |
| ✅ | iot_agent_suse_amd64 | $${0}$$ | $${42.92}$$ < $${43.29}$$ | $${-0}$$ | $${11.25}$$ < $${12.06}$$ |
Regression Detector
Regression Detector Results
Metrics dashboard
Target profiles
Run ID: 337d643c-2f3e-4e92-b0a5-b5c95da9ebfd
Baseline: bb99c82e2fa90eae1f0de48071e02d3416b6f230 Comparison: 080a59dd6da254eb0366ea26399f1adc90cd2ef7 Diff
Optimization Goals: ✅ No significant changes detected
Experiments ignored for regressions
Regressions in experiments with settings containing erratic: true are ignored.
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +0.47 | [-2.57, +3.52] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_metrics_logs | memory utilization | +1.63 | [+1.41, +1.86] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_logs | memory utilization | +0.81 | [+0.73, +0.88] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.53 | [+0.50, +0.57] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_cpu | % cpu utilization | +0.47 | [-2.57, +3.52] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.33 | [+0.27, +0.38] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.06 | [-0.17, +0.29] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.04 | [-0.39, +0.46] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | +0.01 | [-0.11, +0.13] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.00 | [-0.41, +0.41] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.08, +0.07] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | -0.00 | [-0.13, +0.12] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.02 | [-0.07, +0.03] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.05 | [-0.43, +0.32] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.06 | [-0.26, +0.15] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.10 | [-0.14, -0.05] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics | memory utilization | -0.12 | [-0.35, +0.11] | 1 | Logs |
| ➖ | file_tree | memory utilization | -0.24 | [-0.31, -0.17] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.27 | [-0.37, -0.17] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.32 | [-0.47, -0.16] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.42 | [-0.50, -0.35] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.46 | [-1.94, +1.02] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -0.69 | [-0.77, -0.61] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.85 | [-1.01, -0.69] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00% Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
Hi there, thanks for this PR. I've added a wip tag since it seems this is still being drafted. Please let us know once this is ready for review and we'll be happy to review! Thanks so much and let us know if you have any questions.
how are we testing this? should we have some kind of self-monitoring alert for the resulting tagged things?
@apiarian-datadog There's a checklist for manual tests in the PR description under Describe how you validated your changes. Unchecked them all when going back into draft, and will re-test with the new code. Are there existing practices around monitors + combinations of env vars in self monitoring, given env vars can change per-deploy? I see there's some compute stats related code in the e2e tests. I have that repo on my to-onboard list, so adding tests there seems like a good next step to me.
how are we testing this? should we have some kind of self-monitoring alert for the resulting tagged things?
@apiarian-datadog There's a checklist for manual tests in the PR description under
Describe how you validated your changes. Unchecked them all when going back into draft, and will re-test with the new code. Are there existing practices around monitors + combinations of env vars in self monitoring, given env vars can change per-deploy? I see there's some compute stats related code in the e2e tests. I have that repo on my to-onboard list, so adding tests there seems like a good next step to me.
i think the existing trace stats tests should cover your trace stats needs on this one.
i suppose it might be worth adding a small test that creates an app with some tags and confirms that those tags (and the additional automatic ones) end up on the metrics, traces, and logs that we ingest. or perhaps this is an element of self-monitoring.
/merge
View all feedbacks in Devflow UI.
2025-12-11 18:32:22 UTC :information_source: Start processing command /merge
2025-12-11 18:32:29 UTC :information_source: MergeQueue: waiting for PR to be ready
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.
2025-12-11 22:33:11 UTC :warning: MergeQueue: This merge request was unqueued
devflow unqueued this merge request: It did not become mergeable within the expected time
Go Package Import Differences
Baseline: 2441ad422ccb33dadd042c663452f1b07c2af1ed Comparison: 080a59dd6da254eb0366ea26399f1adc90cd2ef7
| binary | os | arch | change |
|---|---|---|---|
| trace-agent | linux | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| trace-agent | linux | arm64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| trace-agent | windows | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| trace-agent | darwin | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| trace-agent | darwin | arm64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| heroku-trace-agent | linux | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| otel-agent | linux | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| otel-agent | linux | arm64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| full-host-profiler | linux | amd64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
| full-host-profiler | linux | arm64 | +0, -1
-github.com/DataDog/datadog-agent/pkg/serverless/tags
|
/merge
View all feedbacks in Devflow UI.
2025-12-18 16:19:31 UTC :information_source: Start processing command /merge
2025-12-18 16:19:37 UTC :information_source: MergeQueue: pull request added to the queue
The expected merge time in main is approximately 1h (p90).
2025-12-18 17:16:35 UTC :information_source: MergeQueue: This merge request was merged