netbox-plugin-prometheus-sd icon indicating copy to clipboard operation
netbox-plugin-prometheus-sd copied to clipboard

Exclude devices/VMs via a config context setting

Open willfurnell opened this issue 2 years ago • 7 comments

Hello, Would it be possible to exclude devices/VMs being shown in the api/plugins/prometheus-sd/devices/ API endpoint please? For context - we're looking at making a low weight config context with something like

{
    "prometheus-plugin-prometheus-sd": {
        "monitor": "false"
    }
}

in it. Then a higher weight one that we can apply to certain roles, sites etc. that has more useful information:

{
    "prometheus-plugin-prometheus-sd": {
        "monitor": "true",
        "metrics_path": "/metrics",
        "port": 9100,
        "scheme": "http"
    }
}

This would then allow us to get just the devices we'd like to monitor, without having to filter by roles in the API and setting the correct config context for roles inside Netbox for instance. Effectively, by only changing config contexts in Netbox your list of devices to monitor is automatically updated without you needing to change any Prometheus config at all!

e.g. you wouldn't need to do api/plugins/prometheus-sd/devices/?role=x&role=y any more because you can do it all via Netbox. What do you think? Thanks!

willfurnell avatar Mar 04 '24 15:03 willfurnell

Or alternatively, what about a config option (to retain backwards compatibility) to only show devices at the API endpoint if they have something configured in the prometheus-plugin-prometheus-sd config context (as otherwise we don't want to monitor them)?

willfurnell avatar Mar 04 '24 15:03 willfurnell

It's very unclear, what exactly do yo want to filter The most generic way is a setup prometheus once, then SD works on you. For us it's look like

http_sd_configs:
  - authorization:
      type: Token
      credentials: <secret>
    refresh_interval: 10m
    url: https://<netbox>/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true&tenant__n=null&cf_prometheus_server={{ inventory_hostname }}

k0ste avatar Mar 04 '24 15:03 k0ste

Sorry I'll try and be more clear!

So by default, the devices and virtual machines API endpoints return literally all devices/VMs. We do not plan to install node_exporter on every single device and VM - and in fact we can't, because lots of devices are things like appliances, patch panels, switches and power distribution units.

Currently, we could use the Netbox filters as part of the API URL like you show, e.g. https://<netbox>/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true&tenant=sometenant&role=somerole which is fine, but it means whenever we want to add a new role or device category to monitor, we'd need to update our http_sd_configs configuration in Prometheus (as we're slowly rolling it out across the estate).

We already need to use the config context in Netbox to define the port, scheme and metrics URL - like so: Screenshot 2024-03-05 at 08 10 08 and the cool thing about config contexts is that we can conditionally apply them to things based on a load of factors, such as Roles as seen there, but also sites, tenants etc. - so we can make multiple config contexts with really specific (or not so specific) apply rules.

What would be really useful is that if we could configure the API endpoint to only return devices and VMs with the config context (with no other query paramaters, like this: https://<netbox>/api/plugins/prometheus-sd/devices/) sapplied to them, and then that means the data returned will just be devices we want to monitor, and if we want to change which devices are monitored, we only need to change the config context(s) in Netbox without even needing to touch the Prometheus configuration at all - because the URL used in Prometheus will never change - effectively we do all the filtering on the Netbox side!

willfurnell avatar Mar 05 '24 08:03 willfurnell

For filtering like you describe we use service templates. Screenshot 2024-03-05 at 11 20 50 So an administrator who wants to scrape, for example, an SSL certificate or the S3 bucket, configures a service on the device or virtual machine and a field that the exporter needs to go to. This is, so to speak, a classic way to configure Prometheus as a service

- job_name: s3_exporter
  metrics_path: /probe
  relabel_configs:
  - separator: ;
    regex: __address__
    replacement: nb_fqdn
    action: labelmap
  - source_labels: [__meta_netbox_services]
    separator: ;
    regex: (.*)(s3_exporter)(.*)
    replacement: $2
    action: keep              #<------------ keep targets where this service is assigned
  - source_labels: [__meta_netbox_custom_field_s3_exporter_bucket]
    separator: ;
    regex: (.*)
    replacement: $1
    action: keep              #<------------ keep targets where bucket is defined 
  - source_labels: [__meta_netbox_custom_field_s3_exporter_bucket]
    separator: ;
    regex: (.*)
    target_label: __param_bucket
    replacement: $1
    action: replace
  - source_labels: [__meta_netbox_primary_ip]
    separator: ;
    regex: (.*)
    target_label: __address__
    replacement: $1:9340
    action: replace
  - source_labels: [__meta_netbox_tenant_slug]
    separator: ;
    regex: (.*)
    target_label: tenant
    replacement: $1
    action: replace
  - source_labels: [__meta_netbox_role_slug]
    separator: ;
    regex: (.*)
    target_label: role
    replacement: $1
    action: replace
  - source_labels: [__meta_netbox_custom_field_environment]
    separator: ;
    regex: (.*)
    target_label: environment
    replacement: $1
    action: replace
  http_sd_configs: #<---------------- support devices & virtual-machines
  - authorization:
      type: Token
      credentials: <secret>
    refresh_interval: 10m
    url: https://<netbox>/api/plugins/prometheus-sd/devices/?status=active&has_primary_ip=true&tenant__n=null&cf_prometheus_server={{ inventory_hostname }}
  - authorization:
      type: Token
      credentials: <secret>
    refresh_interval: 10m
    url: https://<netbox>/api/plugins/prometheus-sd/virtual-machines/?status=active&has_primary_ip=true&tenant__n=null&cf_prometheus_server={{ inventory_hostname }}

As I understand it, you somehow want to transfer the exporter settings, such as port and metrics_path, to the NetBox side. It seems to me that in this case the flexibility of the keep/drop actions and fill __params will be lost.

k0ste avatar Mar 05 '24 08:03 k0ste

Does this stop the devices from being scraped in the first place - as that's what I'm mainly trying to avoid, as at the moment my scrape target stats look like this - just for one tenant! Screenshot 2024-03-05 at 09 56 40

willfurnell avatar Mar 05 '24 09:03 willfurnell

The keep/drop is include/exclude target from scrape targets for a service. This mean if your default service is node_exporter (generic exporter) and u wanna setup additional exporter like ssl_exporter: you generic exporter will always scrape all your targets, and additional exporter will be added to targets only if setuped properly

k0ste avatar Mar 05 '24 10:03 k0ste

Okay this may be a bit of a hack - but I've figured out what I want to do... I've added this in my Prometheus config under relabel_configs

  - source_labels:
    - __address__
    regex: "(.*):999999999"
    action: drop

and then I've added an additional config context in Netbox with the following content, applying to everything and priority 1. image This means that by default all devices get port 999999999 assigned. I then have a config context with a higher priority that is applied to only the roles etc. that I actually want to monitor: image

Then my Prometheus config means that all roles that I don't want to monitor will not be scraped. Importantly, in the future if I want to change the roles that I scrape, I don't have to change my prometheus configuration at all - I can do it all within Netbox - because the http_sd_configs URL never changes from https://netbox/api/plugins/prometheus-sd/devices/?tenant=xyz&has_primary_ip=true!

willfurnell avatar Mar 05 '24 11:03 willfurnell