pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Failed to deploy px to EKS

Open hardy4yooz opened this issue 4 years ago • 13 comments

Describe the bug I can't finish pix installation with pix deploy. I find the error message in source code . https://github.com/pixie-labs/pixie/blob/1ec75d3b13/src/pixie_cli/pkg/cmd/deploy.go

main code as follow:

for !clusterIDExists { // Wait for secret to be updated with clusterID.
				select {
				case <-ctx.Done():
					// Using log.Fatal rather than CLI log in order to track this unexpected error in Sentry.
					log.Fatal("Timed out waiting for cluster ID assignment")
				case <-t.C:
					s := k8s.GetSecret(clientset, namespace, "pl-cluster-secrets")
					if cID, ok := s.Data["cluster-id"]; ok {
						clusterID = uuid.FromStringOrNil(string(cID))
						clusterIDExists = true
					}
				}
			}

it will get secret pl-cluster-secrets,then get cluster-id I can only find sentry-dsn value.

To Reproduce Steps to reproduce the behavior: px deploy

Screenshots image

Logs

px deploy
Pixie CLI

Running Cluster Checks:
 ✔    Kernel version > 4.14.0
 ✔    Cluster type is supported
 ✔    K8s version > 1.12.0
 ✔    Kubectl > 1.10.0 is present
 ✔    User can create namespace
 ✔    Cluster type is in list of known supported types
Installing version: 0.7.8
Generating YAMLs for Pixie
Deploying Pixie to the following cluster: arn:aws-cn:eks:cn-north-1:707589281697:cluster/huoxian-test
Is the cluster correct? (y/n) [y] : y
Found 4 nodes
 ✔    Creating namespace
 ✔    Creating namespace
 ✔    Deleting stale Pixie objects, if any
 ✔    Deploying secrets and configmaps
 ✔    Deploying dependencies: NATS
 ✔    Deploying Cloud Connector
 ⠙    Waiting for Cloud Connector to come online
FATA[0332] Timed out waiting for cluster ID assignment

App information (please complete the following information):

  • Pixie version : 0.7.8
  • K8s cluster version : v1.18.9

hardy4yooz avatar May 12 '21 11:05 hardy4yooz

Can you share the output of px collect-logs? For context, the cloudConnector pod is responsible for populating the cluster-id once it has connected to Pixie Cloud. There is likely something preventing the pod from starting up/connecting.

aimichelle avatar May 12 '21 21:05 aimichelle

Can you share the output of px collect-logs? For context, the cloudConnector pod is responsible for populating the cluster-id once it has connected to Pixie Cloud. There is likely something preventing the pod from starting up/connecting.

Pixie CLI
WARN[0002] Failed to log pod: cert-provisioner-job-khkx5  error="container \"provisioner\" in pod \"cert-provisioner-job-khkx5\" is waiting to start: trying and failing to pull image"
WARN[0002] Failed to log pod: kelvin-788c78675f-949vh    error="container \"app\" in pod \"kelvin-788c78675f-949vh\" is waiting to start: PodInitializing"
WARN[0002] Failed to log pod: pl-nats-1                  error="container \"nats\" in pod \"pl-nats-1\" is waiting to start: ContainerCreating"
WARN[0002] Failed to log pod: vizier-certmgr-67877bb9b4-rmnkl  error="container \"app\" in pod \"vizier-certmgr-67877bb9b4-rmnkl\" is waiting to start: ContainerCreating"
WARN[0002] Failed to log pod: vizier-cloud-connector-54df554584-56h92  error="container \"app\" in pod \"vizier-cloud-connector-54df554584-56h92\" is waiting to start: ContainerCreating"
WARN[0003] Failed to log pod: vizier-metadata-0          error="container \"app\" in pod \"vizier-metadata-0\" is waiting to start: PodInitializing"
WARN[0003] Failed to log pod: vizier-pem-7qt25           error="container \"pem\" in pod \"vizier-pem-7qt25\" is waiting to start: PodInitializing"
WARN[0003] Failed to log pod: vizier-pem-99qjn           error="container \"pem\" in pod \"vizier-pem-99qjn\" is waiting to start: PodInitializing"
WARN[0004] Failed to log pod: vizier-pem-m2skc           error="container \"pem\" in pod \"vizier-pem-m2skc\" is waiting to start: PodInitializing"
WARN[0004] Failed to log pod: vizier-pem-zgdx7           error="container \"pem\" in pod \"vizier-pem-zgdx7\" is waiting to start: PodInitializing"
WARN[0005] Failed to log pod: vizier-proxy-7c45cd454-zkbwb  error="container \"app\" in pod \"vizier-proxy-7c45cd454-zkbwb\" is waiting to start: ContainerCreating"
WARN[0005] Failed to log pod: vizier-query-broker-57d74999ff-k5cbp  error="container \"app\" in pod \"vizier-query-broker-57d74999ff-k5cbp\" is waiting to start: PodInitializing"
Logs written to pixie_logs_20210513163214.zip

hardy4yooz avatar May 13 '21 08:05 hardy4yooz

Can you attach the generated pixie_logs_20210513163214.zip?

aimichelle avatar May 13 '21 18:05 aimichelle

Are you using AWS default CNI?

marinoborges avatar May 14 '21 01:05 marinoborges

Are you using AWS default CNI?

yes. AWS region is cn-north-1.

hardy4yooz avatar May 14 '21 03:05 hardy4yooz

Can you attach the generated pixie_logs_20210513163214.zip?

I am not sure if the logs contain sensitive information. Can I send you an email or other ways?

hardy4yooz avatar May 14 '21 03:05 hardy4yooz

@aimichelle I am also getting exactly same error on Private Clouds. image image I execute “px collect-logs” get the nodes.log,services.log is empty.

BTW,the server cannot access the internet, i found px-operator/6a7202426e40a08e4a421d7ffb76fd866c4c82d8937874b36ed663--1-hdqsc pullImageError,then i I replaced the imageurl in the yaml. Then manually executed 'kubectl apply -f *--1-hdqsc'. I'm not sure if this operation caused the error.

zongjiangU avatar Nov 23 '21 04:11 zongjiangU

any update on this ? or any debug methods ? I am getting the exactly same error, and confused. it's said to check vizer-cloud-connector, but I didn't see any pod with this name.

liyan-ah avatar Mar 03 '22 06:03 liyan-ah

any update on this ? or any debug methods ? I am getting the exactly same error, and confused. it's said to check vizer-cloud-connector, but I didn't see any pod with this name.

yeah !

wnz27 avatar Sep 29 '23 13:09 wnz27

use : px collect-logs and then out put WARN[0006] Failed to log pod: pixie-operator-index-5nb4k error="container "registry-server" in pod "pixie-operator-index-5nb4k" is waiting to start: trying and failing to pull image"

wnz27 avatar Sep 29 '23 13:09 wnz27

截屏2023-09-29 21 04 35

wnz27 avatar Sep 29 '23 13:09 wnz27

@wnz27 can you provide the kubectl describe output of those pods? That should help explain why the image pull is failing.

ddelnano avatar Sep 29 '23 14:09 ddelnano

@wnz27 can you provide the kubectl describe output of those pods? That should help explain why the image pull is failing.

may be the cluster web problem, i'will try later, if i try success. i'll comment here

wnz27 avatar Oct 02 '23 15:10 wnz27