aaw icon indicating copy to clipboard operation
aaw copied to clipboard

Grafana: embeddable user namespace metrics

Open vexingly opened this issue 3 years ago • 2 comments

  • [ ] Grafana dashboard with namespace as variable to display user pod metrics, should be embeddable in size
  • [ ] Metrics should then be embedded into kubeflow with the kubecost information

vexingly avatar Oct 03 '22 13:10 vexingly

Suggested metrics:

# cpu resources
sum(rate(container_cpu_usage_seconds_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)
sum(kube_pod_container_resource_requests{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_limits{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)

# memory resources
sum(container_memory_working_set_bytes{namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_requests{resource="memory",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_limits{resource="memory",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)

# filesystem
sum(rate(container_fs_writes_bytes_total{container!="",  namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)
sum(rate(container_fs_reads_bytes_total{container!="",  namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)

# network activity
sum(rate(container_network_receive_bytes_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*"}[2m])) by (pod, namespace)
sum(rate(container_network_transmit_bytes_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*"}[2m])) by (pod, namespace)

# active notebooks (list)
count(kube_pod_labels{label_notebook_name=~".+"}) by (namespace, pod, label_data_statcan_gc_ca_classification)

vexingly avatar Oct 03 '22 18:10 vexingly

Example: https://grafana.aaw.cloud.statcan.ca/d/WnBiorG4z/notebook-resources?orgId=1

<iframe src="https://grafana.aaw.cloud.statcan.ca/d-solo/WnBiorG4z/notebook-resources?orgId=1&from=1666102521785&to=1666124121786&panelId=2" 
width="450" height="200" frameborder="0"></iframe>

vexingly avatar Oct 18 '22 20:10 vexingly

In all those suggested metrics, do we want namespace='pat-ledgerwood' applied to each one of them? My gut feeling says probably not.

mathis-marcotte avatar Nov 09 '22 14:11 mathis-marcotte

Flattering, but no... we would want to create a variable on the dashboard for the namespace, and then the visualization would use the variable. This should allow the namespace to be customized as part of the url! :)

vexingly avatar Nov 09 '22 15:11 vexingly

Okay, good. I was looking into that. So we will just have the namespace value of the query be variable then. Would we want t o option to be able to view multiple namespaces at the same time?

mathis-marcotte avatar Nov 09 '22 15:11 mathis-marcotte

I think a single namespace is fine for this, since it will be only displayed in kubeflow for a single namespace.

We may want to period to be configurable but that's not even very necessary for this implementation.

vexingly avatar Nov 09 '22 18:11 vexingly

Ok thanks for confirming, I thought that would be most likely. When you say period, do you mean the time range on the grafana dashboard? That is definitely configurable in the url to some degree, like with the namespace variable value.

Also, did we have a preference for where to embed the metrics in kubeflow?

mathis-marcotte avatar Nov 09 '22 18:11 mathis-marcotte

I believe that the allow_embedding configuration (https://grafana.com/docs/grafana/v9.0/setup-grafana/configure-grafana/#allow_embedding) is set to false for our Grafana instance. This is blocking us from embedding metrics in kubeflow

mathis-marcotte avatar Nov 09 '22 19:11 mathis-marcotte

@mathis-marcotte I'll put in a request for this to be made configurable for us form upsteam (CNS), good catch!

vexingly avatar Nov 09 '22 20:11 vexingly

@vexingly Did we also want this one to filter by namespace?

# active notebooks (list)
count(kube_pod_labels{label_notebook_name=~".+"}) by (namespace, pod, label_data_statcan_gc_ca_classification)

mathis-marcotte avatar Nov 15 '22 20:11 mathis-marcotte

Yes, although that one might be less useful to display it was more of an idea to help users i.e. "your memory was high here because you had 6 running notebooks"... we can review how the graphs look with the team when it's working!

vexingly avatar Nov 15 '22 21:11 vexingly

Ok got it! And we can always review the dashboard with the graphs just through grafana directly. It's just the embedding of graphs that isn't currently working

mathis-marcotte avatar Nov 16 '22 14:11 mathis-marcotte

We decided to just add a link to the side menu in kubeflow that will open the grafana dashboard in a new tab, with the selected namespace as a parameter if there is one.

mathis-marcotte avatar Nov 22 '22 17:11 mathis-marcotte