robusta icon indicating copy to clipboard operation
robusta copied to clipboard

<install> statefulset , FailedScheduling

Open wiluen opened this issue 1 year ago • 19 comments

It it not easy to install robusta for me, when I install robusta using helm, they can not start alertmanager-robusta-kube-prometheus-st-alertmanager-0 0/2 Pending 0 18s prometheus-robusta-kube-prometheus-st-prometheus-0 0/2 Pending 0 8s

the Event is: Warning FailedScheduling 49s default-scheduler 0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

and kubectl get pv shows nothing what's wrong with it?

wiluen avatar Dec 10 '24 08:12 wiluen

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

github-actions[bot] avatar Dec 10 '24 08:12 github-actions[bot]

it seems no storageclass and no PV

apiVersion: v1 kind: PersistentVolumeClaim metadata: creationTimestamp: "2024-12-01T11:55:07Z" finalizers:

  • kubernetes.io/pvc-protection labels: alertmanager: robusta-kube-prometheus-st-alertmanager app.kubernetes.io/instance: robusta-kube-prometheus-st-alertmanager app.kubernetes.io/managed-by: prometheus-operator app.kubernetes.io/name: alertmanager name: alertmanager-robusta-kube-prometheus-st-alertmanager-db-alertmanager-robusta-kube-prometheus-st-alertmanager-0 namespace: default resourceVersion: "398335" uid: 9c481da6-803e-43cf-8f34-c23137203bd0 spec: accessModes:
  • ReadWriteOnce resources: requests: storage: 10Gi volumeMode: Filesystem status: phase: Pending

wiluen avatar Dec 10 '24 09:12 wiluen

Hi @wiluen ,

Thanks for reporting this. Which k8s distribution are you using? Is it on-prem? public cloud (amazon, google, other)?

This might happen if the cluster doesn't have a storage provisioner (the component responsible to create a PV from the PVC)

How do you typically create persistent volumes ?

Can you share the output of: kubectl get storageclass ?

arikalon1 avatar Dec 10 '24 10:12 arikalon1

thanks for your reply. my k8s is On-Prem yes, I don't have storageclass, kubectl get storageclass is nothing. it is a lab cluster in campus. I just created a PV manually and it can bound PVC.

but another question is prometheus-robusta-kube-prometheus-st-prometheus-db-prometheus-robusta-kube-prometheus-st-prometheus-0 requires 100Gi , but my VMs don't have 100Gi disk, and I also can not edit the field of resources.requests.storage. How can I do? spec: accessModes:

  • ReadWriteOnce resources: requests: storage: 100Gi # I want a small value volumeMode: Filesystem

wiluen avatar Dec 10 '24 10:12 wiluen

Hi @wiluen

you can change the storage size in the generated_values.yaml file:

kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      storageSpec:
        volumeClaimTemplate:
          spec:
            resources:
              requests:
                storage: 10Gi

arikalon1 avatar Dec 10 '24 11:12 arikalon1

thanks very much! @arikalon1 there are a still bug about the images of glusterfs

docker pull quay.io/gluster/gluster-centos:latest latest: Pulling from gluster/gluster-centos [DEPRECATION NOTICE] Docker Image Format v1 and Docker Image manifest version 2, schema 1 support is disabled by default and will be removed in an upcoming release. Suggest the author of quay.io/gluster/gluster-centos:latest to upgrade the image to the OCI Format or Docker Image manifest v2, schema 2. More information at https://docs.docker.com/go/deprecated-image-specs/

wiluen avatar Dec 10 '24 11:12 wiluen

@arikalon1 how to get all of the configurable field in generated_values.yaml

wiluen avatar Dec 10 '24 11:12 wiluen

Hi @wiluen

Where do we have a reference to gluster-centos in Robusta? Can you share more details?

Regarding the configuration options, you can see most of it in our defaults values.yaml file

It also has a dependency to the kube-prometheus-stack , you can also configure via the robusta generated_values.yaml file. The config values of kube-prometheus-stack can be found here

arikalon1 avatar Dec 10 '24 13:12 arikalon1

Hi @arikalon1 image

image

actually I dont know what it was, but it appears in my k8s cluster. I thought this was part of the Robusta, and there are also some job when I deploy the Robusta, I dont know what is them, so in my picture, do I miss some important pods?

wiluen avatar Dec 10 '24 13:12 wiluen

hey @wiluen

The gclusterfs looks like some daemon set, but it's not a part of Robusta When Robusta starts, it runs an efficiency scan, krr. You can later see the results in the UI. It helps right sizing your k8s workloads (setting the correct resources requests and limits)

Looks like your robusta installation is up, and healthy!

arikalon1 avatar Dec 10 '24 14:12 arikalon1

hi @arikalon1 there are so many problems, Thank you for your patient answer

I finish the install, it is easy to install if there are no network problem. and use enablePrometheusStack: true and deploy a crashing pod.
(1)but i cant connect to prometheus image

(2)I see the logs of Pod prometheus-robusta-kube-prometheus-st-prometheus-0, there are error : ts=2024-12-14T07:18:39.171Z caller=notifier.go:530 level=error component=notifier alertmanager=http://172.20.245.213:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post "http://172.20.245.213:9093/api/v2/alerts": context deadline exceeded" image

(3)besides,i see AI can do summary for logs, but in the UI of holmesgpt , it cant connect to the gpt image image the summary of logs seems right.

wiluen avatar Dec 14 '24 08:12 wiluen

Hi @wiluen

Do you have network policies in your cluster? The robusta components need to be able to connect to one another

Can you share the robusta-runner and alert manager logs? Is the IP prometheus is trying to connect to, really belongs to alert manager?

arikalon1 avatar Dec 14 '24 08:12 arikalon1

Hi @arikalon1 image the log of robusta-runner is: ERROR Couldn't connect to Prometheus found under http://robusta-kube-prometheus-st-prometheus.default.svc.cluster.local:9090

wiluen avatar Dec 14 '24 08:12 wiluen

hey @wiluen

Can you share kubectl get pods -o wide so the pod ips are visible? trying to check if this is indeed the alert manager pod ip

Do you have network policies defined in the cluster?

arikalon1 avatar Dec 14 '24 09:12 arikalon1

hi @arikalon1 the results is: image

I don't think there are any additional network strategies because my cluster is just a simple testing cluster.

wiluen avatar Dec 14 '24 09:12 wiluen

the ip seems right, but prometheus is not able to connect to alert manager In addition, looks like robusta-runner is not able to connect to prometheus or holmes

I suspect there's some networks restrictions in the cluster

Can you share: kubectl get networkpolicies -A ?

arikalon1 avatar Dec 14 '24 09:12 arikalon1

nothing here image

wiluen avatar Dec 14 '24 09:12 wiluen

hi @arikalon1 what is the ip and port of alert manager and prometheus and holmes I'm not sure. If I knew, I might have a way to solve it

wiluen avatar Dec 14 '24 09:12 wiluen

you can see it on the pods list you shared

arikalon1 avatar Dec 14 '24 10:12 arikalon1