Grafana: embeddable user namespace metrics
- [ ] Grafana dashboard with namespace as variable to display user pod metrics, should be embeddable in size
- [ ] Metrics should then be embedded into kubeflow with the kubecost information
Suggested metrics:
# cpu resources
sum(rate(container_cpu_usage_seconds_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)
sum(kube_pod_container_resource_requests{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_limits{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
# memory resources
sum(container_memory_working_set_bytes{namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_requests{resource="memory",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
sum(kube_pod_container_resource_limits{resource="memory",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace)
# filesystem
sum(rate(container_fs_writes_bytes_total{container!="", namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)
sum(rate(container_fs_reads_bytes_total{container!="", namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace)
# network activity
sum(rate(container_network_receive_bytes_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*"}[2m])) by (pod, namespace)
sum(rate(container_network_transmit_bytes_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*"}[2m])) by (pod, namespace)
# active notebooks (list)
count(kube_pod_labels{label_notebook_name=~".+"}) by (namespace, pod, label_data_statcan_gc_ca_classification)
Example: https://grafana.aaw.cloud.statcan.ca/d/WnBiorG4z/notebook-resources?orgId=1
<iframe src="https://grafana.aaw.cloud.statcan.ca/d-solo/WnBiorG4z/notebook-resources?orgId=1&from=1666102521785&to=1666124121786&panelId=2"
width="450" height="200" frameborder="0"></iframe>
In all those suggested metrics, do we want namespace='pat-ledgerwood' applied to each one of them? My gut feeling says probably not.
Flattering, but no... we would want to create a variable on the dashboard for the namespace, and then the visualization would use the variable. This should allow the namespace to be customized as part of the url! :)
Okay, good. I was looking into that. So we will just have the namespace value of the query be variable then. Would we want t o option to be able to view multiple namespaces at the same time?
I think a single namespace is fine for this, since it will be only displayed in kubeflow for a single namespace.
We may want to period to be configurable but that's not even very necessary for this implementation.
Ok thanks for confirming, I thought that would be most likely. When you say period, do you mean the time range on the grafana dashboard? That is definitely configurable in the url to some degree, like with the namespace variable value.
Also, did we have a preference for where to embed the metrics in kubeflow?
I believe that the allow_embedding configuration (https://grafana.com/docs/grafana/v9.0/setup-grafana/configure-grafana/#allow_embedding) is set to false for our Grafana instance. This is blocking us from embedding metrics in kubeflow
@mathis-marcotte I'll put in a request for this to be made configurable for us form upsteam (CNS), good catch!
@vexingly Did we also want this one to filter by namespace?
# active notebooks (list)
count(kube_pod_labels{label_notebook_name=~".+"}) by (namespace, pod, label_data_statcan_gc_ca_classification)
Yes, although that one might be less useful to display it was more of an idea to help users i.e. "your memory was high here because you had 6 running notebooks"... we can review how the graphs look with the team when it's working!
Ok got it! And we can always review the dashboard with the graphs just through grafana directly. It's just the embedding of graphs that isn't currently working
We decided to just add a link to the side menu in kubeflow that will open the grafana dashboard in a new tab, with the selected namespace as a parameter if there is one.