Pat Ledgerwood
Pat Ledgerwood
Suggested metrics: ``` # cpu resources sum(rate(container_cpu_usage_seconds_total{namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}[2m])) by (container, namespace) sum(kube_pod_container_resource_requests{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace) sum(kube_pod_container_resource_limits{resource="cpu",namespace="pat-ledgerwood", pod!~"ml-pipeline.*", container!='vault-agent', container!='istio-proxy'}) by (container, namespace) # memory resources...
Example: https://grafana.aaw.cloud.statcan.ca/d/WnBiorG4z/notebook-resources?orgId=1 ```html ```
Flattering, but no... we would want to create a variable on the dashboard for the namespace, and then the visualization would use the variable. This should allow the namespace to...
I think a single namespace is fine for this, since it will be only displayed in kubeflow for a single namespace. We may want to period to be configurable but...
@mathis-marcotte I'll put in a request for this to be made configurable for us form upsteam (CNS), good catch!
PR here: https://github.com/StatCan/aaw-kubeflow-containers/pull/488 Also includes https://github.com/StatCan/aaw/issues/1758 Additional fixes for performance and to resolve other testing issues Having some problems with the current build... https://github.com/StatCan/aaw-kubeflow-containers/actions
Ready for review https://github.com/StatCan/aaw-kubeflow-containers/pull/517
@StanHatko unfortunately no, the issue is that one (or more) of the storage volumes won't detach from the previously attached node that the pod was on. We did manually detach...
New image / kubeflow 1.6 does not appear to have resolved this issue, now that the cluster has stabilized after the upgrade, we will attempt to collect more data on...
Hi @rochellegarner, we have submitted an issue to the cloud team regarding an unstable component of our clusters networking and are currently waiting for feedback on this. We're not sure...