amazon-eks-ami icon indicating copy to clipboard operation
amazon-eks-ami copied to clipboard

bug(nodeadm): Unable to create nodepool with RAID0 localstorage strategy in version >= v20250620

Open samarthkathal opened this issue 8 months ago • 2 comments

What happened: We are using the local NVMe SSDs available on i4i.xlarge instances as ephemeral storage for our Kubernetes workloads, specifically for an ElasticSearch deployment that requires low-latency I/O. This is achieved using the localStorage: RAID0 strategy within the node.eks.aws/v1alpha1 NodeConfig manifest, which is passed via user data to the EKS-optimized Amazon Linux 2023 (AL2023) AMI.

A previously successful nodepool, configured with the AMI version 1.31.7-20250610, failed to upgrade when we attempted to change its release version to 1.31.7-20250620. The nodepool creation process became stuck and eventually failed. We also tried a fresh deployment with the new AMI, which exhibited the same failure.

What you expected to happen: The nodepool upgrade should have successfully completed, and the new nodes should have joined the cluster with their local NVMe SSDs configured as a RAID0 volume, coming up as ephemeral storage for the kubelet. creation of new nodepool should have also passed successfully with nodes recoginizing local SSD as ephemeral storage

How to reproduce it (as minimally and precisely as possible):

  1. Use the provided Terraform configuration to create a launch template and a managed EKS nodepool with i4i.xlarge instances and ami_type = "AL2023_x86_64_STANDARD".

  2. Set the initial release_version to 1.31.7-20250610.

  3. Observe that the nodepool successfully launches and the nodes join the cluster. You can confirm that node is able to recogonize and use Local SSD as Pod Ephemeral storage

  4. Update the release_version to >=1.31.7-20250620 and apply the changes.

  5. Observe that the new nodes fail to initialize and get stuck in the Creating state indefinitely before eventually failing.

nodepool.tf

resource "aws_eks_node_group" "al23_raid0_1_31_7-20250610" {
  cluster_name    = data.aws_eks_cluster.atom_eks.name
  node_group_name = "al23-raid0-1-31-7-20250610"
  node_role_arn   = var.node_pool_role_arn
  subnet_ids      = var.atom_private_subnet_ids
  instance_types  = ["i4i.xlarge"]

  ami_type = "AL2023_x86_64_STANDARD"
  release_version = "1.31.7-20250610"

  launch_template {
    id      = aws_launch_template.al23_raid0.id
    version = aws_launch_template.al23_raid0.latest_version
  }

  scaling_config {
    desired_size = 2
    max_size     = 8
    min_size     = 2
  }

  lifecycle {
    ignore_changes        = [scaling_config[0].desired_size]
    create_before_destroy = true
  }

  labels = {
    "sage_es" = "true"
    "raid0" = "v20250610"
  }

  taint {
    key    = "dedicated"
    value  = "sage_es"
    effect = "NO_SCHEDULE"
  }

  # Required for local disk RAID0
  update_config {
    max_unavailable = 1
  }
}

resource "aws_launch_template" "al23_raid0" {
  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size = 50
    }
  }

  user_data = base64encode(templatefile("${path.module}/i4i_nvme_nodeadm.userdata_raid0.tftpl", {
    CLUSTER_NAME        = data.aws_eks_cluster.atom_eks.name
    API_SERVER_ENDPOINT = data.aws_eks_cluster.atom_eks.endpoint
    CA_AUTHORITY_B64    = data.aws_eks_cluster.atom_eks.certificate_authority[0].data
    CLUSTER_CIDR        = data.aws_eks_cluster.atom_eks.kubernetes_network_config[0].service_ipv4_cidr
  }))

  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"
    http_put_response_hop_limit = 1
    instance_metadata_tags      = "enabled"
  }

  vpc_security_group_ids = [var.atom_security_group_id]
}

i4i_nvme_nodeadm.userdata_raid0.tftpl

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${CLUSTER_NAME}
    apiServerEndpoint: ${API_SERVER_ENDPOINT}
    certificateAuthority: ${CA_AUTHORITY_B64}
    cidr: ${CLUSTER_CIDR}
  instance:
    localStorage:
      strategy: RAID0
--BOUNDARY--

there is a comment on a similar-older issue which is also worth referencing: https://github.com/awslabs/amazon-eks-ami/issues/2122#issuecomment-2904174482

Environment:

  • AWS Region: us-west-2
  • Instance Type(s): i4i.xlarge
  • Cluster Kubernetes version: 1.31
  • Node Kubernetes version: 1.31
  • AMI Version: 1.31.7-20250610

samarthkathal avatar Aug 19 '25 18:08 samarthkathal

unfortunately I wasn't able to repro using these assets with a 1.31 cluster i created. nodes landed on amazon-eks-node-al2023-x86_64-standard-1.31-v20250620 and were still able to join the cluster. I could be doing something wrong, but I'm not seeing a bug in the actual code path that might explain any regression

get stuck in the Creating state indefinitely

are you able to get nodeadm's logs (journalctl -u nodeadm-config -u nodeadm-run) on any of the nodes that get stuck? I'm able to see that the disk setup itself was successful.

Disk Setup Logs
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2152]: {"level":"info","ts":1755710383.698419,"caller":"init/init.go:114","msg":"Setting up system aspects..."}
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2152]: {"level":"info","ts":1755710383.698438,"caller":"init/init.go:117","msg":"Setting up system aspect..","name":"local-disk"}
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2179]: mdadm: chunk size defaults to 512K
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2179]: mdadm: Defaulting to version 1.2 metadata
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2179]: mdadm: array /dev/md/kubernetes started.
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]: meta-data=/dev/md/kubernetes     isize=512    agcount=32, agsize=7147776 blks
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]:          =                       sectsz=512   attr=2, projid32bit=1
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]:          =                       reflink=1    bigtime=1 inobtcount=1
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]: data     =                       bsize=4096   blocks=228726656, imaxpct=25
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]:          =                       sunit=128    swidth=128 blks
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]: log      =internal log           bsize=4096   blocks=111688, version=2
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]:          =                       sectsz=512   sunit=8 blks, lazy-count=1
Aug 20 17:19:43 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2196]: realtime =none                   extsz=4096   blocks=0, rtextents=0
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal systemctl[2205]: Created symlink /etc/systemd/system/multi-user.target.wants/mnt-k8s\x2ddisks-0.mount → /etc/systemd/system/mnt-k8s\x2ddisks-0.mount.
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2171]: Copying /var/lib/kubelet/ to /mnt/k8s-disks/0/kubelet/
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal systemctl[2259]: Created symlink /etc/systemd/system/multi-user.target.wants/var-lib-kubelet.mount → /etc/systemd/system/var-lib-kubelet.mount.
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2171]: Copying /var/lib/containerd/ to /mnt/k8s-disks/0/containerd/
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal systemctl[2291]: Created symlink /etc/systemd/system/multi-user.target.wants/var-lib-containerd.mount → /etc/systemd/system/var-lib-containerd.mount.
Aug 20 17:19:44 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2171]: Copying /var/log/pods/ to /mnt/k8s-disks/0/pods/
Aug 20 17:19:45 ip-192-168-158-88.us-west-2.compute.internal systemctl[2331]: Created symlink /etc/systemd/system/multi-user.target.wants/var-log-pods.mount → /etc/systemd/system/var-log-pods.mount.
Aug 20 17:19:45 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2171]: Successfully setup RAID-0 consisting of /dev/nvme1n1
Aug 20 17:19:45 ip-192-168-158-88.us-west-2.compute.internal nodeadm[2152]: {"level":"info","ts":1755710385.2556493,"caller":"init/init.go:121","msg":"Set up system aspect","name":"local-disk"}

ndbaker1 avatar Aug 20 '25 17:08 ndbaker1

This issue is stale because it has been open for 60 days with no activity. Remove the stale label or comment to avoid closure in 14 days

github-actions[bot] avatar Nov 20 '25 16:11 github-actions[bot]