telegraf icon indicating copy to clipboard operation
telegraf copied to clipboard

feat(outputs.nebius_cloud_monitoring): Add 'service' configuration setting

Open abrekhov opened this issue 2 years ago • 9 comments

Summary

Internal dev team of Nebius Managed Services came to me and asked to make this parameter configurable.

Checklist

  • [X] No AI generated code was used in this PR

Related issues

resolves #14657

abrekhov avatar Jan 31 '24 00:01 abrekhov

@abrekhov can you explain how this parameter can be used?!? It says "normally is should not be changed", so I wonder when to change it and where to find the value to change it to...

srebhan avatar Jan 31 '24 12:01 srebhan

@abrekhov can you explain how this parameter can be used?!? It says "normally is should not be changed", so I wonder when to change it and where to find the value to change it to... Hi @srebhan ! We've already discussed this topic in the first PR https://github.com/influxdata/telegraf/pull/13379#issuecomment-1583330289

But few weeks ago DevTeam of Nebius Cloud came to me and said that they want to use telegraf in service monitoring (from the side of the cloud). And they use this service parameter to identify to which service this metrics belongs (e.g. managed postgres cluster, or data-proc service, etc.). So this comment about "normally is should not be changed" is for end customers who want to gather custom metrics from VMs.

Could you please also point me where did I fail with the semantic of PR?

abrekhov avatar Feb 02 '24 11:02 abrekhov

Could you please also point me where did I fail with the semantic of PR?

If your PR only contains a single commit, the semantic checker uses that to check for a semantic message. In your case:

make service configurable again

When you have more than one commit, then the checker will look at the PR title.

powersj avatar Feb 02 '24 13:02 powersj

@abrekhov - what is the consequence of changing that value? As in, if I am a user and change it to "backend" thinking I am monitoring my own backend, will metrics get rejected? Or show up differently?

powersj avatar Feb 02 '24 13:02 powersj

@abrekhov - what is the consequence of changing that value? As in, if I am a user and change it to "backend" thinking I am monitoring my own backend, will metrics get rejected? Or show up differently?

As I understand this metrics will be rejected. But if devteam starts to monitor VMs from managed service side and set it to managed-postgres, then it will be processed and user will see such metrics from managed service :

image

abrekhov avatar Feb 02 '24 17:02 abrekhov

@abrekhov,

@srebhan and I are both concerned that this will cause confusion and mistakes by users. However, we would love to enable the backend team to utilize telegraf. Could we instead of a config option have the backend team set an environment variable which changes this setting?

That way the option stays out of the config a user would see and it would prevent possibly setting this incorrectly.

Thanks!

powersj avatar Feb 06 '24 20:02 powersj

@abrekhov,

@srebhan and I are both concerned that this will cause confusion and mistakes by users. However, we would love to enable the backend team to utilize telegraf. Could we instead of a config option have the backend team set an environment variable which changes this setting?

That way the option stays out of the config a user would see and it would prevent possibly setting this incorrectly.

Thanks!

Great idea! Could you provide some example in other plugin where it is implemented?

abrekhov avatar Feb 12 '24 19:02 abrekhov

Could you provide some example in other plugin where it is implemented?

Essentially you will want to override the value if an environment variable like NEBIUS_SERVICE is set via os.Getenv("NEBIUS_SERVICE").

So you can update your logic to something like:

if a.service == "" {
    a.service = "custom"
}
if service := os.Getenv("NEBIUS_SERVICE"); service != "" {
    a.service = service
}

Does that help?

powersj avatar Feb 12 '24 20:02 powersj

Thanks!

Could you please explain why lint-linux job is failing over and over again?

abrekhov avatar Feb 21 '24 15:02 abrekhov

Ah I had missed that:

Please run "make docs" and push the updated README. Failing CI.

You made two changes:

  1. You updated the sample.conf, but didn't update the corresponding readme. The readme includes an embedded copy of the sample.conf and they need to stay in sync. Running make docs will fix this.
  2. You also updated the yandex cloud monitoring readme, where the sample.conf is embedded and they are different. So again you need to keep both files in sync. Running make docs won't fix this one, as make docs takes the sample.conf and embeds it in the readme.

To be honest, the yandex change doesn't really belong as a part of this PR and I hesitated when I saw that, but if you make the corresponding update to the sample.conf we can include it.

powersj avatar Feb 21 '24 15:02 powersj

Ah I had missed that:

Please run "make docs" and push the updated README. Failing CI.

You made two changes:

  1. You updated the sample.conf, but didn't update the corresponding readme. The readme includes an embedded copy of the sample.conf and they need to stay in sync. Running make docs will fix this.
  2. You also updated the yandex cloud monitoring readme, where the sample.conf is embedded and they are different. So again you need to keep both files in sync. Running make docs won't fix this one, as make docs takes the sample.conf and embeds it in the readme.

To be honest, the yandex change doesn't really belong as a part of this PR and I hesitated when I saw that, but if you make the corresponding update to the sample.conf we can include it.

I've executed make docs twice: before previous commit and last one (cleanup). I've deleted changes that not correspond to nebius plugin. Hope that pipeline will succeed.

abrekhov avatar Feb 22 '24 01:02 abrekhov

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. Downloads for additional architectures and packages are available below.

:partying_face: This pull request decreases the Telegraf binary size by -3.20 % for linux amd64 (new size: 221.5 MB, nightly size 228.8 MB)

:package: Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

telegraf-tiger[bot] avatar Feb 22 '24 02:02 telegraf-tiger[bot]

Awesome! Thanks @abrekhov!

@srebhan @powersj Thank you guys!

abrekhov avatar Feb 22 '24 09:02 abrekhov