Daniel Clark
Daniel Clark
This was running too slow locally and is usually not needed during normal iteration. Move the multi-arch images to cloudbuild. Also add some doc nits in the manifests.
Closes # ## 📑 Description Added a prometheus integration with two analyzers: 1. `PrometheusConfigValidate` 2. `PrometheusConfigRelabelReport` The integration does not deploy any Prometheus stack in the cluster. Instead, it searches...
Hello - I'm trying to see if it's possible to deploy NVIDIA DCGM on K8s with the `securityContext.privileged` field set to `false` for security reasons. I was able to get...
It is tempting to only rely on defaulting webhooks to ensure any changes to the OperatorConfig updates the .collection.externalLabels and rules.externalLabels fields with the default project, location, and cluster labels....
The feature [doesn't scale well](https://github.com/GoogleCloudPlatform/prometheus-engine/issues/774) in larger clusters with lots of PodMonitorings. Let's explore ways to reduce the resource footprint of the operator when this feature is enabled. Acceptance Criteria:...
In cases where the webhooks can't reach the operator (e.g. operator OOMs), is it worth trying out a [`failurePolicy=Ignore`](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy) in some cases? Acceptance Criteria: - Assess trade-offs of "failing open"...
If this is feasible, this would be nice as its API is easier to manage and allows us to avoid regenerating clientsets that are only used for testing.
So it can be used to scrape the kubelet in clusters with self-signed certs (e.g. kind). Akin to https://github.com/GoogleCloudPlatform/prometheus-engine/issues/223 but for NodeMonitoring.
Can we find ways to avoid OOM crashes in the gmp-operator? Maybe using a [VPA](https://gist.github.com/pintohutch/65bc578f1ca7f9d07ad44ff944168bb6)? Acceptance criteria: - Proposal with design and trade-offs
The hardcoded `scrape_config` for the kubelet and does [not include](https://github.com/GoogleCloudPlatform/prometheus-engine/blob/f1923f31bfc1c75457198674d865b45630938afc/pkg/operator/collection.go#L558-L563) `project_id`, `location`, or `cluster`, which is in contrast to the `scrape_config` [relabeling](https://github.com/GoogleCloudPlatform/prometheus-engine/blob/f1923f31bfc1c75457198674d865b45630938afc/pkg/operator/apis/monitoring/v1/types.go#L626-L642) from `PodMonitoring`. In practice, this isn't a big...