Recommended way to monitor disk usage
Please, answer some short questions which should help us to understand your problem / question better?
- Which image of the operator are you using? v1.7.0
- Where do you run it - cloud or metal? Kubernetes or OpenShift? Google Kubernetes Engine (GKE)
- Are you running Postgres Operator in production? No
- Type of issue? Question
I'm hoping this is a simple question with a simple answer. Is there a recommended way to monitor disk space usage? I understand that one can use pg_database_size and related commands however, this does not (as far as I can see) include the disk space used by the log files. To truly see the disk space being used one must use something like df -h.
This is OK interactively but how to include this in a monitoring script? There are solutions on StackOverflow such as this one, but how would one enable the cron job in the pod? Before I start hacking I thought I would ask if there is an 'official' way to do this. I have looked through the docs but I didn't find anything that appears to address this question.
Of course, the reason I would like to monitor disk space usage is because the database stops working when it runs out of disk space.
Thanks for any suggestions.
We do this by periodically calling bg_mon rest endpoint on port 8080 with ZMON.
I came to the same need. Cadvisor do not provide pvc usage/free metrics yet with containerd. So as workaround, we have prometheus node_exporter daemon set + kube-state metrics. Combining node_filesystem_avail_bytes,kube_persistentvolumeclaim_info,kube_pod_spec_volumes_persistentvolumeclaims_info gives available bytes on pvc for all pods/pvc. Query is not nice, but gives what I need. Working well with EKS and one on-premise K8s cluster. When combined with kube_persistentvolumeclaim_resource_requests_storage_bytes you can get also percentage of used space.
sum without (device,instance,mountpoint,uid, account, fstype,Namespace,app,chart,component,controller_revision_hash,heritage,job,pod_template_generation,release,region) (( kube_pod_spec_volumes_persistentvolumeclaims_info{k8s_cluster=~".+"} * on (persistentvolumeclaim,k8s_cluster) group_left(volumename) kube_persistentvolumeclaim_info{} ) * on (uid, volumename,k8s_cluster) group_right(persistentvolumeclaim,pod,volume) label_replace( label_replace(node_filesystem_avail_bytes{mountpoint=~".*(pvc-[a-z0-9\\-]*).*"},"volumename","$1","mountpoint",".*(pvc-[a-z0-9\\-]*).*"), "uid","$1","mountpoint",".*/([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})/.*") )/1024/1024/1024
I also excluded some IMHO not needed volumes from node_exporter monitoring:
collector.filesystem.mount-points-exclude: ^/(dev|proc|run.*|sys|etc.*|opt|local|mnt|var/lib/docker/.+|var/lib/containers/storage/.+|boot.*|(local/)?var/lib/bottlerocket|(local/)?var/lib/.*(kubernetes.io~projected|kubernetes.io~secret|kubernetes.io~empty-dir|volume-subpaths).*)($|/)