Add support for filtering tags in k8s workloads.
What does this PR do?
Just add the anotation tags.datadog.com/disable: "comma,separated,tags,to,be,disabled"
Add support in taglist to remove tags.
The idea is to use this before calling the tags.Compute() method The current implementation is done for k8s workloads, but it could be expanded to other places.
(ps: I am still learning go, please forgive any mistakes. )
Motivation
Wanted to exclude tags from the kubernetes workload for specific pods, this pr allows to do that during runtime.
Example: pod_name is high cardinality, I want it enabled for some services, but not for others.
Additional Notes
It reuses an existing annotation and is compatible with previous use cases of the annotation (only kube_service was supported before my change.)
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Tested using unit testing and planning to rollout the agent with the changes in a subset of our infrastructure.
Reviewer's Checklist
- [ ] If known, an appropriate milestone has been selected; otherwise the
Triagemilestone is set. - [ ] Use the
major_changelabel if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote. - [ ] A release note has been added or the
changelog/no-changeloglabel has been applied. - [ ] Changed code has automated tests for its functionality.
- [ ] Adequate QA/testing plan information is provided if the
qa/skip-qalabel is not applied. - [ ] At least one
team/..label has been applied, indicating the team(s) that should QA this change. - [ ] If applicable, docs team has been notified or an issue has been opened on the documentation repo.
- [ ] If applicable, the
need-change/operatorandneed-change/helmlabels have been applied. - [ ] If applicable, the
k8s/<min-version>label, indicating the lowest Kubernetes version compatible with this feature. - [ ] If applicable, the config template has been updated.
Can someone provide a feedback on this PR ?
@lmello maybe you need to contact support to get them to send engineers here to look at it? I'm interested in your feature being added, so I hope they'll expedite the process.
upport to get them to send engineers here to look at it? I'm interested in your feature being added, so I hope they'll expedite the process.
@2rs2ts I gave up, contacted support, TAM and it didn't move up. We end up building our own pipeline before shipping metrics and logs to datadog using vector, to sanitize and clean tags.
I gave up, contacted support, TAM and it didn't move up. We end up building our own pipeline before shipping metrics and logs to datadog using vector, to sanitize and clean tags.
Boy is that a common story. Sorry to hear it, but I'm not surprised; it feels like engineering must be a skeleton crew over there
it's amazing that someone at datadog wont even awk this PR. The company is just telling us all they care about is money and making firewalling tags in metrics hard to do that no one wants to do it.
This pull request has been automatically marked as stale because it has not had activity in the past 15 days.
It will be closed in 30 days if no further activity occurs. If this pull request is still relevant, adding a comment or pushing new commits will keep it open. Also, you can always reopen the pull request if you missed the window.
Thank you for your contributions!
Incredible that if the Datadog engineers just don't get to a PR in a month, the bot will just close it. That's insane and will definitely lead to a lot of wasted work. And spamming OP with messages doesn't help anybody...
Hi @2rs2ts,
Thank you for taking the time to share your feedback. We recently introduced a system to help manage long-standing issues through automated stale management. We understand this may have come as a surprise for some existing issues. We want to apologize if it felt that way. It was not our intent, and we want to reassure you that every issue is important to us and reviewed carefully. For your specific issue, we’re reviewing it to ensure the discussion progresses and receives the right level of attention. Thank you for helping improve the Datadog Agent.
— The Datadog Agent Team
Hi @lmello,
First of all, I want to sincerely apologize for the long silence on this pull request. You’re absolutely right; it’s completely unacceptable for a contribution to wait this long, and I’m truly sorry that your work didn’t receive the attention it deserved. This PR unfortunately fell through the cracks due to a missing triage label, and we’ve started an internal initiative to ensure this kind of situation doesn’t happen again.
Regarding the intent of this change: the need you identified is totally valid, and we’ve heard similar feedback from other users over time. However, we believe the specific implementation in this PR could introduce a few significant issues.
-
Metric collisions: Allowing arbitrary tag removal can cause collisions in the Agent’s metric aggregator (for example, with gauge metrics), potentially producing inaccurate values.
- For instance, removing the
pod_nametag when two pods from the same deployment run on the same host can merge their metrics incorrectly.
- For instance, removing the
- Impact on official integrations: Since the tag removal happens at the “tagger” component level, it would also affect built-in integration metrics and internal product features that rely on consistent tagging, leading to degraded functionality and user experience.
The good news is that over the past year we’ve developed a safer and more flexible approach that addresses this same need. You can now configure tag cardinality for DogStatsD metrics directly, either at the Agent, client, or even metric level.
- The Agent setting
dogstatsd_tag_cardinalitysupportsnone,low,orchestrator, orhigh.- For Kubernetes, we recommend
orchestratorto avoid the type of metric collisions mentioned above.
- For Kubernetes, we recommend
- DogStatsD client libraries now also support configuring cardinality:
- Globally via the
DD_CARDINALITYenvironment variable (set in the application's environment) or client option. - Per-metric when submitting new data points.
- Globally via the
You can find more details here:
- Assigning tags and tag cardinality
- DogStatsD developer guide
- DogStatsD protocol reference
- DogStatsD metrics submission
Would this new capability address the original problem you were trying to solve?
If not, we’d love to continue the conversation to explore other options within the Agent or the client libraries.
Your feedback is extremely valuable, and we truly appreciate your patience and the effort you’ve put into contributing to the project.
Thanks again for raising this, and sorry once more for the delay.
Can't speak for OP, but FWIW, I think I filed a support ticket saying "please implement this person's FR" a long time ago, so I can answer for myself. And then with OP's feedback maybe we will have a picture of where to go from there.
So, the ability to set cardinality levels at the DogStatsD client is pretty good. I would want to see the same for checks cardinality. A problem we've had to deal with is needing to add custom tags of different names for stuff that exposes prometheus metrics that give you stats on pods, nodes, etc. The prometheus check will collect the name of the pod, node, etc. that's providing those metrics, which would get conflated with the prometheus labels trying to convey the same kind of info, but about other pods/nodes. So, we'd just create custom labels or mappings, adding to our cardinality and making monitors and dashboards way more annoying to write.
Also, the way you can only modify the cardinality level with low granularity is disappointing. It'd be better to get as granular as possible. But yeah, overriding a particular client to use low cardinality instead of orchestrator, then having to add in some tags back manually, could be useful for some cases. It's definitely better than nothing, and I appreciate it existing.
Hi @2rs2ts 👋
So, the ability to set cardinality levels at the DogStatsD client is pretty good. I would want to see the same for checks cardinality.
Good news — this capability already exists for check instances. The documentation is a bit hard to find, but you can configure per-check tag cardinality here:
https://docs.datadoghq.com/getting_started/integrations/#per-check-tag-configuration
Let us know if this works for your Prometheus checks.
If you need even more control over the data exposed by a Prometheus/OpenMetrics check, you can also build a custom integration as described here:
https://docs.datadoghq.com/developers/custom_checks/prometheus/
@clamoriniere
Good news — this capability already exists for check instances.
Thanks, that's great to hear! I wish it was easier to find that stuff in the docs, but then again, it's not like I actively looked for it; I just assumed–based on the earlier reply–that it was only for StatsD clients.
Let us know if this works for your Prometheus checks.
I'll test it out next time it comes up, but FWIW, rather than "working" for me, it just probably gets me closer to my ideal setup (e.g., get rid of a lot of stuff I don't want, at the cost of a few things I do want that I will have to add back in manually or possibly do without.) I believe this will be true for most users: their particular situations call for different sets of tags, and the cardinality levels are not granular enough to align for most of them. But, again, this is way better than not having the option!
If you need even more control over the data exposed by a Prometheus/OpenMetrics check, you can also build a custom integration
Yeah, in our case we were scraping metrics we created ourselves, so implementing a custom check was not necessary (we'd have to do this work either way, might as well create only one moving part, rather than two.)