operator-sdk icon indicating copy to clipboard operation
operator-sdk copied to clipboard

operator-sdk 1.20.0 breaks k8s_status in FIPS enabled OpenShift cluster

Open efussi opened this issue 3 years ago • 3 comments

Bug Report

What did you do?

I have an Ansible operator image based on quay.io/operator-framework/ansible-operator:v1.19.1 which adds the kubernetes.core:2.3.0 and operator_sdk.util:0.4.0 collections in requirements.yaml. One of the playbook tasks sets the status of a CR like so:

- name: Set status to {{ status }} for {{ ansible_operator_meta.name }} in {{ ansible_operator_meta.namespace }}
  k8s_status:
    api_version: "acme.com/v1beta1"
    kind: AcmeThing
    name: "{{ ansible_operator_meta.name }}"
    namespace: "{{ ansible_operator_meta.namespace }}"
    status:
      acmeStatus: "{{ status }}"
      acmeVersion: "{{ version | default(omit) }}"
  register: set_cr_status
  retries: 3
  delay: 5
  until: set_cr_status is not failed

This works just fine on my FIPS-enabled OCP 4.8 cluster.

What did you expect to see?

When I change the base image to ansible-operator:v1.20.0 it continues to work.

What did you see instead? Under which circumstances?

When I change the base image to ansible-operator:v1.20.0 task k8s_status fails:

fatal: [localhost]: FAILED! => {"attempts": 3, "changed": false, "error": "[digital envelope routines: EVP_DigestInit_ex] disabled for FIPS", "msg": "Failed to get client due to %s"}

Environment

Operator type:

/language ansible

Kubernetes cluster type:

OpenShift 4.8.39

$ operator-sdk version

operator-sdk version: "v1.20.0", commit: "deb3531ae20a5805b7ee30b71f13792b80bd49b1", kubernetes version: "1.23", go version: "go1.17.9", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

$ kubectl version

$ oc version
Client Version: 4.8.36
Server Version: 4.8.39
Kubernetes Version: v1.21.8+ed4d8fd

Possible Solution

The problem seems to be related to using MD5 hashes which are restricted in FIPS mode, compare https://github.com/s3tools/s3cmd/issues/1005#issuecomment-578241131.

Additional context

efussi avatar May 03 '22 08:05 efussi

I patched my operator to run with ANSIBLE_VERBOSITY=3 and was able to gather the stack trace:

The full traceback is:
  File "/tmp/ansible_k8s_status_payload_bi0wnjm8/ansible_k8s_status_payload.zip/ansible_collections/operator_sdk/util/plugins/module_utils/api_utils.py", line 86, in get_api_client
    client = DynamicClient(kubernetes.client.ApiClient(configuration))
  File "/usr/local/lib/python3.8/site-packages/openshift/dynamic/client.py", line 40, in __init__
    K8sDynamicClient.__init__(self, client, cache_file=cache_file, discoverer=discoverer)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/client.py", line 84, in __init__
    self.__discoverer = discoverer(self, cache_file)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 224, in __init__
    Discoverer.__init__(self, client, cache_file)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 48, in __init__
    default_cachefile_name = 'osrcp-{0}.json'.format(hashlib.md5(default_cache_id).hexdigest())
fatal: [localhost]: FAILED! => {
    "attempts": 3,
    "changed": false,
    "error": "[digital envelope routines: EVP_DigestInit_ex] disabled for FIPS",

Comparing the pip freeze output for ansible-operator:v1.19.1 and ansible-operator:v1.20.0 the kubernetes version changed from 12.0.1 to 23.3.0. However, both seem to have the same code:

$ grep md5 /usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py
        default_cachefile_name = 'osrcp-{0}.json'.format(hashlib.md5(default_cache_id).hexdigest())

When I patch discovery.py in my operator's Dockerfile, it works:

 && ansible-galaxy collection install -r ${HOME}/requirements.yml \
 && site_packages=/usr/local/lib/python3.8/site-packages \
 && sed -i -e 's/hashlib.md5(default_cache_id)/hashlib.md5(default_cache_id, usedforsecurity=False)/' ${site_packages}/kubernetes/dynamic/discovery.py \

While it's still not clear to me which of the python package updates from 1.19.1 to 1.20.0 caused this, I think the proper fix here involves two steps:

  1. Update package kubernetes (tracked through https://github.com/kubernetes-client/python/issues/1851)
    • [x] https://github.com/kubernetes-client/python/pull/1854
    • [x] waiting for next release: https://github.com/kubernetes-client/python/releases/tag/v25.3.0
  2. Pull the updated package into operator-sdk (can be tracked through the subject issue)
    • [x] https://github.com/operator-framework/operator-sdk/releases/tag/v1.26.0

efussi avatar Jul 06 '22 09:07 efussi

The source code appears to be here: https://github.com/kubernetes-client/python/blob/2677e9c810b62a82e75e65d07e502d49ec74a551/kubernetes/base/dynamic/discovery.py#L48

efussi avatar Jul 06 '22 09:07 efussi

I had observed a FIPS issue with python openshift package version 0.13.1 https://github.com/openshift/openshift-restclient-python/issues/427#issuecomment-1103702707

Looks like Ansible operator now uses openshift version 0.13.1 https://github.com/operator-framework/operator-sdk/commit/9bb14cc42e1bf1e3d769961f7ecb8b4c67012523 https://github.com/operator-framework/operator-sdk/blob/master/images/ansible-operator/Pipfile.lock#L265

venkataramanam avatar Jul 26 '22 16:07 venkataramanam

With https://github.com/kubernetes-client/python/releases/tag/v25.3.0 released, the above patch in the operator's Dockerfile can be changed to:

 && pip3 install --no-cache-dir kubernetes~=25.3.0 \
 && ansible-galaxy collection install -r ${HOME}/requirements.yml \

efussi avatar Oct 26 '22 06:10 efussi

@efussi

Thank you. Did a quick test by installing openshift==0.13.1 and it installed kubernetes==25.3.0 as a dependency which has the fix you had committed.

venkataramanam avatar Oct 26 '22 07:10 venkataramanam

https://github.com/operator-framework/operator-sdk/releases/tag/v1.26.0 contains kubernetes 25.3.0 which has the fix.

efussi avatar Dec 10 '22 09:12 efussi