standard_init_linux.go:228: exec user process caused: exec format error
Bug Report
What did you do?
I installed olm on linux/s390x and connected to a catalog of operators.
I then attempted to install an operator. This resulted in an InstallPlan resource being created and a Job resource being created in the OLM namespace.
The job failed, and looking at the corresponding pod, I could see it failed to run an initContainer called util
kubectl logs 648e77c9a7adc015c096cf0e2667326e167263a93266a3c6f52b4a62adlgnf2 util -n olm
standard_init_linux.go:228: exec user process caused: exec format error
Googling for this error suggests that the image is for the wrong architecture, and this is where Im confused as to how this is happening. From the pod yaml, this is the image its trying to retrieve:
initContainers:
- command:
- /bin/cp
- -Rv
- /bin/cpb
- /util/cpb
image: quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
imagePullPolicy: IfNotPresent
name: util
This appears to be exactly the same image, as is being run for the olm operator pod, which is running successfully.
containers:
- args:
- --namespace
- $(OPERATOR_NAMESPACE)
- --writeStatusName
- ""
command:
- /bin/olm
env:
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: OPERATOR_NAME
value: olm-operator
image: quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
imagePullPolicy: IfNotPresent
kubectl get pods -n olm
NAME READY STATUS RESTARTS AGE
648e77c9a7adc015c096cf0e2667326e167263a93266a3c6f52b4a62adlgnf2 0/1 Init:Error 0 2d1h
648e77c9a7adc015c096cf0e2667326e167263a93266a3c6f52b4a62admm8n6 0/1 Init:Error 0 2d1h
catalog-operator-8d9d97478-8v5mx 1/1 Running 0 6d21h
ibm-operator-catalog-x9gln 1/1 Running 0 2d2h
olm-operator-64b58958bb-pprtx 1/1 Running 0 21h
packageserver-545b4f5db8-nf42n 1/1 Running 0 6d23h
packageserver-545b4f5db8-pkp9h 1/1 Running 0 6d21h
So Im confused how an image can be running successfully in one container and failing when used in another container, unless that image really does have a binary in it for the wrong architecture?
What did you expect to see?
The job to execute to completion
What did you see instead? Under which circumstances?
The pod for the job failed at the first initContainer
Environment
- operator-lifecycle-manager version:
sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
- Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"linux/s390x"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:29:16Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/s390x"}
- Kubernetes cluster kind: Vanilla cluster with only 1 node (I had to add tolerations to various pods to get them to run on the control plane node)
Possible Solution
Additional context Add any other context about the problem here.
In order to support different architectures in our release builds, OLM is fed information from the environment when choosing which image to use in the unpacker pod, something seems to be going wrong here. Could you please share:
- Which version of OLM you installed.
- Where you got the OLM manifests.
- The entire output of the OLM Deployment yaml.
Hello,
I installed OLM via the operator-sdk, using the following set of commands
apt-get install operator-sdk
export ARCH=$(case $(uname -m) in x86_64) echo -n amd64 ;; aarch64) echo -n arm64 ;; *) echo -n $(uname -m) ;; esac)
export OS=$(uname | awk '{print tolower($0)}')
export OPERATOR_SDK_DL_URL=https://github.com/operator-framework/operator-sdk/releases/download/v1.23.0
curl -LO ${OPERATOR_SDK_DL_URL}/operator-sdk_${OS}_${ARCH}
gpg --keyserver keyserver.ubuntu.com --recv-keys 052996E2A20B5C7E
curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt
curl -LO ${OPERATOR_SDK_DL_URL}/checksums.txt.asc
gpg -u "Operator SDK (release) <[email protected]>" --verify checksums.txt.asc
grep operator-sdk_${OS}_${ARCH} checksums.txt | sha256sum -c -
chmod +x operator-sdk_${OS}_${ARCH} && sudo mv operator-sdk_${OS}_${ARCH} /usr/local/bin/operator-sdk
operator-sdk
operator-sdk olm -h
operator-sdk olm install
The operator-sdk version is operator-sdk version: "v1.23.0", commit: "1eaeb5adb56be05fe8cc6dd70517e441696846a4", kubernetes version: "1.24.2", go version: "go1.18.5", GOOS: "linux", GOARCH: "s390x" and from the sha of the operator-framework/olm image, I think its v0.22
The full deployment yaml is
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: "2022-09-08T12:50:25Z"
generation: 2
labels:
app: olm-operator
name: olm-operator
namespace: olm
resourceVersion: "23091916"
uid: 70eff54d-6b69-47ca-bbef-8b6886972605
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: olm-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: olm-operator
spec:
containers:
- args:
- --namespace
- $(OPERATOR_NAMESPACE)
- --writeStatusName
- ""
command:
- /bin/olm
env:
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: OPERATOR_NAME
value: olm-operator
image: quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: olm-operator
ports:
- containerPort: 8080
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
requests:
cpu: 10m
memory: 160Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
serviceAccount: olm-operator-serviceaccount
serviceAccountName: olm-operator-serviceaccount
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-09-08T12:50:25Z"
lastUpdateTime: "2022-09-08T12:57:57Z"
message: ReplicaSet "olm-operator-64b58958bb" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-09-14T15:39:12Z"
lastUpdateTime: "2022-09-14T15:39:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 2
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I did have to edit the yaml to add a toleration to run the deployment on my control plane node.
Incase it helps with debug, I tried an experiment directly with the image using docker on the linux/s390x system
# docker pull quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b: Pulling from operator-framework/olm
Digest: sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
Status: Image is up to date for quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
Run the default entrypoint which does something
# docker run quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
time="2022-09-20T10:30:42Z" level=info msg="log level info"
{"level":"error","ts":1663669842.7247128,"logger":"controller-runtime.client.config","msg":"unable to get kubeconfig","error":"invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable","errorCauses":[{"error":"no configuration has been provided, try setting KUBERNETES_MASTER environment variable"}],"stacktrace":"sigs.k8s.io/controller-runtime/pkg/client/config.GetConfigOrDie\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/client/config/config.go:153\nmain.Manager\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/cmd/olm/manager.go:50\nmain.main\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/cmd/olm/main.go:135\nruntime.main\n\t/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/proc.go:250"}
now run the /bin/cp command as the entrypoint, which fails
# docker run --entrypoint /bin/cp quay.io/operator-framework/olm@sha256:2b4fee73c05069d9d2c537c7d3072241097914748abfb938b5b08c969b2f544b
standard_init_linux.go:228: exec user process caused: exec format error
I also tried deliberately pulling the s390x image down to an amd64 system and reran the same commands. In this case /bin/cp executed, so my guess is the s390x image is being built from an amd64 base layer.
# docker run quay.io/operator-framework/olm@sha256:14afcf5c38f7055cb5a45a053da10791469e58de264bc449bef24f54b8bb6be2
WARNING: The requested image's platform (linux/s390x) does not match the detected host platform (linux/amd64) and no specific platform was requested
exec /bin/olm: exec format error
# docker run --entrypoint /bin/cp quay.io/operator-framework/olm@sha256:14afcf5c38f7055cb5a45a053da10791469e58de264bc449bef24f54b8bb6be2
WARNING: The requested image's platform (linux/s390x) does not match the detected host platform (linux/amd64) and no specific platform was requested
BusyBox v1.34.1 (2022-04-13 00:26:55 UTC) multi-call binary.
Usage: cp [-arPLHpfinlsTu] SOURCE DEST
or: cp [-arPLHpfinlsu] SOURCE... { -t DIRECTORY | DIRECTORY }