Add support for targeting services for scraping
An increasing number of providers are placing their metrics endpoints behind a service rather than digesting them directly from a deployment. Kyverno and ArgoCD are notable examples. Other implementations of Prometheus collectors have a CRD that allows you to scrape metrics by targeting a service. https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/crds/crd-servicemonitors.yaml
The only way to target these endpoints currently is to ignore the service, parse the deployment that's generating metrics for a label you can use, and construct a PodMonitoring resource for each deployment that generates metrics. It sucks because it makes what should be a single simple monitoring resource into multiple monitoring resources that are generally more brittle.
There might be some way to use the podMonitoring resource to listen on a service that I'm just not aware of. Here's an example yaml based on the prom-example example in GCP's documentation here that includes a service. I feel like this should work, but it doesn't...
apiVersion: apps/v1
kind: Deployment
metadata:
name: prom-example
namespace: gmp-test
labels:
app: prom-example
spec:
selector:
matchLabels:
app: prom-example
replicas: 3
template:
metadata:
labels:
app: prom-example
spec:
containers:
- image: nilebox/prometheus-example-app@sha256:dab60d038c5d6915af5bcbe5f0279a22b95a8c8be254153e22d7cd81b21b84c5
name: prom-example
ports:
- name: metrics
containerPort: 1234
command:
- "/main"
- "--process-metrics"
- "--go-metrics"
---
apiVersion: v1
kind: Service
metadata:
name: gmp-test-service
namespace: gmp-test
spec:
ports:
- port: 5678
protocol: TCP
targetPort: metrics
selector:
app: prom-example
sessionAffinity: None
type: ClusterIP
---
apiVersion: monitoring.googleapis.com/v1alpha1
kind: PodMonitoring
metadata:
name: prom-example
namespace: gmp-test
spec:
selector:
matchLabels:
app: prom-example
endpoints:
- port: 5678
interval: 5s
Hello,
We currently support Pod scraping only due to potential scalability concerns with monitoring services and endpoints on larger clusters.
What about using the label selector defined on the Service's .spec.selector in your PodMonitoring's .spec.selector.matchLabels and specifying the underlying deployment's container port in .endpoints?
That's pretty much what we're doing right now, it just translates into more work for services like argocd that have multiple published metrics locations.
Gotcha. That makes sense.
For the time being, we don't plan on supporting ServiceMonitoring mainly due to the scalability concerns I mentioned earlier. Also PodMonitoring generally fits most use-cases, albeit occasionally using workarounds, as you stated.
If you would like to leverage the prometheus-operator ServiceMonitor CRD, another option is to replace the OSS image with the gke.gcr.io/prometheus-engine/prometheus image in the prometheus-operator or kube-prometheus stack.
If I have N pods running behind my service, does a matching PodMonitoring scrape all of them at the same time? I have some metrics I don't want to scrape in parallel - getting the metrics from just one pod is enough - and for this, I think I'll need ServiceMonitoring.
@reith - Prometheus does not scrape targets at the same time, but uses an offset algorithm to spread the load amongst the targets found in a job.
Can you describe more what you're trying to scrape behind your service?
@pintohutch The metric I'm trying to push is directly read from a DB; it's not a per-pod metric and I don't need to calculate the metric - run the query - for each pod.
I know it doesn't sound perfect to push these metrics from pods but they fit my architecture best. Ideally, I'd prefer GCP Monitoring to make a chart from the data that is also replicated to the Bigquery but it doesn't seem viable. I could also create another single-pod deployment but it brings some overhead.
The metric I'm trying to push is directly read from a DB; it's not a per-pod metric and I don't need to calculate the metric - run the query - for each pod.
IIUC you have N pods, each with essentially the same set of metrics you're trying to scrape (i.e. foo_total=1 on all N pods?) And you don't want to have to write N copies of the data to GMP?
I know it doesn't sound perfect to push these metrics from pods but they fit my architecture best. Ideally, I'd prefer GCP Monitoring to make a chart from the data that is also replicated to the Bigquery but it doesn't seem viable. I could also create another single-pod deployment but it brings some overhead.
@lyanco may be able to answer any questions you have around product gaps in Google Cloud Monitoring, but I think we'd need a little more detail.
The metric I'm trying to push is directly read from a DB; it's not a per-pod metric and I don't need to calculate the metric - run the query - for each pod.
IIUC you have N pods, each with essentially the same set of metrics you're trying to scrape (i.e.
foo_total=1on all N pods?) And you don't want to have to write N copies of the data to GMP?
That's right. I think a ServiceMonitor would let me do this but I haven't worked with other implementations of Prometheus operators.
I think you can do this with PodMonitoring by adding a specific label to one of the pods and then adding that label to the selector field in the PodMonitoring.
Yea - if there's a particular pod that is, say leader: true or something, that could work.
It could also work if you have a special label on your metrics from one of the pods using metricRelabeling to drop the time series from other pods.
The fundamental reason we haven't supported service-based monitoring is due to scaling concerns when running Prometheus as a DaemonSet. Having every pod in a 1000 node cluster watching K8s endpoints stresses the api server.
I think you can do this with PodMonitoring by adding a specific label to one of the pods and then adding that label to the selector field in the PodMonitoring.
The pods are part of a Deployment and have the same set of labels. I think it's an anti-pattern to treat specific pods of Deployments differently. I don't want to bother myself with relabeling pods once a pod crashes or the number of replicas decreases.
It could also work if you have a special label on your metrics from one of the pods using metricRelabeling to drop the time series from other pods.
I'd also like to decrease the number of redundant readings, not just the number of samples pushed to GMP.
The fundamental reason we haven't supported service-based monitoring is due to scaling concerns when running Prometheus as a DaemonSet. Having every pod in a 1000 node cluster watching K8s endpoints stresses the api server.
I don't understand why it'd need to watch endpoints. The operator could watch services and configure Prometheus to scrape cluster IPs, couldn't it?
I don't understand why it'd need to watch endpoints.
If we wanted service-monitoring, it may be preferable to watch endpoints so we can get the service labels in addition to the pod or node (via __meta_kubernetes_endpoint_address_target_kind ) hosting the service to enrich the target relabeling we do (this is what prometheus-operator does for example).
The operator could watch services and configure Prometheus to scrape cluster IPs, couldn't it?
Indeed, but this would essentially be folding in features of Prometheus service discovery to the operator, which would be a pretty big expansion in concerns and complexity. Not to mention the risk of OOMing the operator in larger clusters with dozens or hundreds of services (which is bad for a lot of reasons, not the least of which it being our webhook server).
This is not to say it's impossible or that we won't some day pursue this feature, but it just hasn't been high-enough priority to warrant feature development for, as most can get by with pod-level monitoring.
I had a discussion with @pintohutch regarding how to watch endpoints efficiently for service monitoring. Although it's more of a general discussion than GMP-specific, we both think others may find it helpful. So I'll post it here:
@simonpasquier @ArthurSens Please check my comment above. For more details please check my (just submitted) GSoC proposal.