actions-runner-controller Node.js Executable Not Found and running When Using gcr.io/kaniko-project/executor:debug Base Image

Checks

[X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
[X] I am using charts that are officially provided

Controller Version

0.9.3

Deployment Method

Helm

Checks

[X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
[X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Deploy the `gha-runner-scale-set` with type kubernetes mode enabled.

Describe the bug

I'm currently running GitHub Actions runners with mode:kubernetes enabled, using two different base containers as the image: ubuntu:latest and gcr.io/kaniko-project/executor:debug.

When I use gcr.io/kaniko-project/executor:debug as the base image, the Node.js executable is not found at the specified path, causing the workflow to fail immediately with the following error:

Screenshot 2024-07-30 at 7 56 11 PM

Upon inspecting the container, Node.js is present but not in an executable form, which causes the container to fail with the below error:

sh: /__e/node20/bin/node: not found

When I exec into the pod using the gcr.io/kaniko-project/executor:debug image:

/__e # ls
node16  node20
/__e #
/__e # cd node20/
/__e/node20 # ls
CHANGELOG.md  LICENSE       README.md     bin           include       lib           share
/__e/node20 #
/__e/node20 # cd bin/
/__e/node20/bin # ls
corepack  node      npm       npx
/__e/node20/bin #
/__e/node20/bin # pwd
/__e/node20/bin
/__e/node20/bin # /__e/node20/bin/node
sh: /__e/node20/bin/node: not found
/__e/node20/bin #
/__e/node20/bin #
/__e/node20/bin #
/__e/node20/bin # /__e/node20/bin/node
sh: /__e/node20/bin/node: not found

sh: /__e/node20/bin/node: not found

Workflow File Using gcr.io/kaniko-project/executor:debug:

name: Build and Deploy

on:
 workflow_call:
   inputs:
     branch:
       required: true
       type: string
       default: 'main'
     build_registry:
       required: true
       type: string

jobs:
 arm-build:
   runs-on: [test-runners]
   container:
     image: gcr.io/kaniko-project/executor:debug
   permissions:
     contents: read
     packages: write

   steps:
     - uses: actions/checkout@v3
       with:
         fetch-depth: 0

Working Example with ubuntu:latest

When using the ubuntu:latest image, the Node.js executable is present and the workflow runs as expected.

name: Build and Deploy

on:
 workflow_call:
   inputs:
     branch:
       required: true
       type: string
       default: 'main'
     build_registry:
       required: true
       type: string

jobs:
 arm-build:
   runs-on: [test-runners]
   container:
     image: ubuntu:latest
   permissions:
     contents: read
     packages: write

   steps:
     - uses: actions/checkout@v3
       with:
         fetch-depth: 0

Node.js Executable Found:

root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# ls
corepack  node  npm  npx
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin#
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin#
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# pwd
/__e/node20/bin
root@test-runners-px7pq-runner-2wsv4-workflow:/__e/node20/bin# /__e/node20/bin/node
Welcome to Node.js v20.13.1.
Type ".help" for more information.
>

This issue is a major blocker as it prevents us from using the customized image based on our requirements. We need to run the containers using our predefined image with custom images and packages installed.

I need help to resolve this issue so that the Node.js executable can be found and used correctly when using the gcr.io/kaniko-project/executor:debug image. Any insights or solutions would be greatly appreciated.

Describe the expected behavior

The workflow should run successfully and Node.js packages should be installed correctly, even when using a customized image such as gcr.io/kaniko-project/executor:debug. The Node.js executable should be found and functional, ensuring that all steps in the workflow proceed without errors.

Additional Context

## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
githubConfigUrl: "https://github.com/"

## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
githubConfigSecret:
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  #github_app_id: ""
  #github_app_installation_id: ""
  #github_app_private_key: |

  ### GitHub PAT Configuration
  # github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
##   For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
##   For a pre-defined secret using GitHub App, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
githubConfigSecret: github-token

## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
# proxy:
#   http:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   https:
#     url: http://proxy.com:1234
#     credentialSecretRef: proxy-auth # a secret with `username` and `password` keys
#   noProxy:
#     - example.com
#     - example.org

# maxRunners is the max number of runners the autoscaling runner set will scale up to.
# maxRunners: 5

# minRunners is the min number of idle runners. The target number of runners created will be
# calculated as a sum of minRunners and the number of jobs assigned to the scale set.
minRunners: 2

runnerGroup: "test-runners"

# ## name of the runner scale set to create.  Defaults to the helm release name
runnerScaleSetName: "test-runners"

## A self-signed CA certificate for communication with the GitHub server can be
## provided using a config map key selector. If `runnerMountPath` is set, for
## each runner pod ARC will:
## - create a `github-server-tls-cert` volume containing the certificate
##   specified in `certificateFrom`
## - mount that volume on path `runnerMountPath`/{certificate name}
## - set NODE_EXTRA_CA_CERTS environment variable to that same path
## - set RUNNER_UPDATE_CA_CERTS environment variable to "1" (as of version
##   2.303.0 this will instruct the runner to reload certificates on the host)
##
## If any of the above had already been set by the user in the runner pod
## template, ARC will observe those and not overwrite them.
## Example configuration:
#
# githubServerTLS:
#   certificateFrom:
#     configMapKeyRef:
#       name: config-map-name
#       key: ca.crt
#   runnerMountPath: /usr/local/share/ca-certificates/

## Container mode is an object that provides out-of-box configuration
## for dind and kubernetes mode. Template will be modified as documented under the
## template object.
##
## If any customization is required for dind or kubernetes mode, containerMode should remain
## empty, and configuration should be applied to the template.
containerMode:
  type: "kubernetes"  ## type can be set to dind or kubernetes
  ## the following is required when containerMode.type=kubernetes
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: "gp3"
    resources:
      requests:
        storage: 5Gi
#   kubernetesModeServiceAccount:
#     annotations:

## listenerTemplate is the PodSpec for each listener Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
listenerTemplate:
  spec:
    nodeSelector:
      purpose: github-actions
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions
        effect: NoSchedule   
    containers:
    # Use this section to append additional configuration to the listener container.
    # If you change the name of the container, the configuration will not be applied to the listener,
    # and it will be treated as a side-car container.
    - name: listener
      resources:
        limits:
          cpu: "500m"
          memory: "500Mi" 
        requests:
          cpu: "250m"
          memory: "250Mi"
      # securityContext:
        # runAsUser: 1000
#     # Use this section to add the configuration of a side-car container.
#     # Comment it out or remove it if you don't need it.
#     # Spec for this container will be applied as is without any modifications.
#     - name: side-car
#       image: example-sidecar

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec
template:
  template:
    spec:
      containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
          - name: ACTIONS_RUNNER_CONTAINER_HOOKS
            value: /home/runner/k8s/index.js
          - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
            value: /etc/config/runner-template.yaml
          - name: ACTIONS_RUNNER_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
            value: "true"
        volumeMounts:
          - name: work
            mountPath: /home/runner/_work
          - mountPath: /etc/config
            name: hook-template
      volumes:
        - name: hook-template
          configMap:
            name: runner-config
        - name: work
          ephemeral:
            volumeClaimTemplate:
              spec:
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "local-path"
                resources:
                  requests:
                    storage: 1Gi          
  spec:
    securityContext:
      fsGroup: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        command: ["/home/runner/run.sh"]
        env:
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
    nodeSelector:
      purpose: github-actions-arm
    tolerations:
      - key: purpose
        operator: Equal
        value: github-actions-arm
        effect: NoSchedule       

## Optional controller service account that needs to have required Role and RoleBinding
## to operate this gha-runner-scale-set installation.
## The helm chart will try to find the controller deployment and its service account at installation time.
## In case the helm chart can't find the right service account, you can explicitly pass in the following value
## to help it finish RoleBinding with the right service account.
## Note: if your controller is installed to only watch a single namespace, you have to pass these values explicitly.
# controllerServiceAccount:
#   namespace: arc-system
#   name: test-arc-gha-runner-scale-set-controller

Controller Logs

https://gist.github.com/kanakaraju17/4f58c0b332451ef6fab345a8078a6b3b

Runner Pod Logs

https://gist.github.com/kanakaraju17/c61f8da3038741634acea68f40c12afc

Jul 30 '24 14:07 kanakaraju17

also interested in this since we want to get secrets to repository from hashicorp/vault-action

Sep 25 '24 16:09 andersbackman-rf

I ran into this exact same issue using Kaniko and figured out a solution. When running NodeJS based actions such as actions/checkout, they're ran using Node.JS 20 that is externally mounted into the workflow container into /__e/node20/bin/node. This Node.JS executable isn't statically linked, so it dynamically links some libraries from the host its running at, namely all these:

$ ldd externals/node20/bin/node
        linux-vdso.so.1 (0x000075316a1e9000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000075316a1db000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000753169faf000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000753169ec8000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000753169ea8000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000753169ea3000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000753169c78000)
        /lib64/ld-linux-x86-64.so.2 (0x000075316a1eb000)

The problem lies in how the Kaniko container image is built; it uses the empty base scratch and copies the Kaniko executable, some tools and - if using the debug container - busybox into the image. Therefore the Kaniko container image doesn't contain any usual libraries. This is fine since Kaniko is built using Go, and is therefore entirely statically linked. But Node.JS won't work as-is because of the missing libraries, which causes the not-very-descriptive "not found" error. My solution is to create my own custom Kaniko container image that includes the libraries from a relatively recent distro that supports Node.JS 20, I used Debian bookworm:

FROM debian:bookworm-slim AS debian

# containerd/ARC attempt to run shell stuff inside the container, use the debug image since it
# contains busybox + utilities
FROM gcr.io/kaniko-project/executor:debug

# lie about the container being Debian to make some ARC stuff behave nicely
COPY --from=debian /etc/os-release /etc/os-release

# ARC runs nodejs actions on the workflow container by mounting node to the container. nodejs is
# dynamically linked and the Kaniko container doesn't contain any supporting libraries for node to
# run, so copy required libraries to the container
COPY --from=debian /lib/x86_64-linux-gnu/libdl.so.2 /lib/x86_64-linux-gnu/libdl.so.2
COPY --from=debian /lib/x86_64-linux-gnu/libstdc++.so.6 /lib/x86_64-linux-gnu/libstdc++.so.6
COPY --from=debian /lib/x86_64-linux-gnu/libm.so.6 /lib/x86_64-linux-gnu/libm.so.6
COPY --from=debian /lib/x86_64-linux-gnu/libgcc_s.so.1 /lib/x86_64-linux-gnu/libgcc_s.so.1
COPY --from=debian /lib/x86_64-linux-gnu/libpthread.so.0 /lib/x86_64-linux-gnu/libpthread.so.0
COPY --from=debian /lib/x86_64-linux-gnu/libc.so.6 /lib/x86_64-linux-gnu/libc.so.6
COPY --from=debian /lib64/ld-linux-x86-64.so.2 /lib64/ld-linux-x86-64.so.2

WORKDIR /workspace
ENTRYPOINT ["/kaniko/executor"]

Dec 17 '24 19:12 Spanfile

Closing this one since it is not related to ARC Thank you, @Spanfile, for answering this issue!

Mar 18 '25 16:03 nikola-jokic