trinityX icon indicating copy to clipboard operation
trinityX copied to clipboard

Image creation fails

Open javree opened this issue 2 years ago • 17 comments

Following the install guide at https://docs.clustervision.com/install/install/ on Rocky Linux 8.9 Controller install went fine, ansible finished without issues However image creation fails :

TASK [init : Install init packages] ************************************************************************************************ failed: [compute.osimages.luna] (item=python3-libselinux) => {"ansible_loop_var": "item", "changed": false, "item": "python3-libselinux", "msg": "Could not import the dnf python module using /usr/libexec/platform-python (3.6.8 (default, Jan 15 2024, 23:09:02) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)]). Please install python3-dnf or python2-dnf package or ensure you have specified the correct ansible_python_interpreter. (attempted ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python2', '/usr/bin/python'])", "results": []}

PLAY RECAP ************************************************************************************************************************* compute.osimages.luna : ok=3 changed=0 unreachable=0 failed=1 skipped=1 rescued=0 ignored=0
controller1 : ok=52 changed=5 unreachable=0 failed=0 skipped=34 rescued=0 ignored=0

[root@marclus0 site]# cat /etc/redhat-release Rocky Linux release 8.9 (Green Obsidian)

[root@marclus0 site]# rpm -qa | grep -i ansible ansible-8.3.0-1.el8.noarch ansible-core-2.15.3-1.el8.x86_64

We've not edited anything in the playbook

javree avatar Jan 18 '24 07:01 javree

I assuem you just executed ansible-playbook compute-redhat.yml ? I just reinstalled Rocky 8.9 and rolled out controller.yml and compute-redhat.yml and they work here.

PLAY RECAP ****************************************************************************************************************************************************************
compute.osimages.luna      : ok=124  changed=74   unreachable=0    failed=0    skipped=136  rescued=0    ignored=1
controller1                : ok=55   changed=20   unreachable=0    failed=0    skipped=33   rescued=0    ignored=0

Do you have python3-dnf packages installed, e.g.:

rpm -qa | grep python3-dnf
python3-dnf-plugin-versionlock-4.0.21-23.el8.noarch
python3-dnf-4.7.0-19.el8.noarch
python3-dnf-plugins-core-4.0.21-23.el8.noarch

msteggink avatar Jan 18 '24 15:01 msteggink

I have exactly those RPM's installed : python3-dnf-plugin-versionlock-4.0.21-23.el8.noarch python3-dnf-4.7.0-19.el8.noarch python3-dnf-plugins-core-4.0.21-23.el8.noarch

Indeed executed that exact command. I've just ran ansible-playbook -vvvv compute-redhat.yml >> log.txt 2>&1 and attached it's output , as well as a full rpm list log.txt rpmlist.txt

javree avatar Jan 18 '24 16:01 javree

Hi @javree , for some reason it picked up 3.6 in the image. The python36 is pulled in by gdm, OpenHPC and OOD. Was this your second ansible run? Did you do anything with your environment (env/set)?

Can you do

ansible --version | grep python

Can you also retry the run by adding the following to ansible.cfg?

interpreter_python=/usr/bin/python3.11

msteggink avatar Jan 19 '24 12:01 msteggink

@javree did the line fixed it for the compute-redhat.yml ?

msteggink avatar Jan 22 '24 13:01 msteggink

Unfortunately no ; since you mentioned it might have something to do with running the playbook multiple times, I am underway fully redeploying the controller and start fresh.

javree avatar Jan 22 '24 15:01 javree

Unfortunately no ; since you mentioned it might have something to do with running the playbook multiple times, I am underway fully redeploying the controller and start fresh.

So what you're basically telling us is that the Ansible playbooks are not idempotent?

I wonder if starting from fresh solved the issue 🤔

xdkreij avatar Jan 24 '24 10:01 xdkreij

Will report next week, how a fresh install went

javree avatar Jan 24 '24 11:01 javree

Sorry for the delay in getting back. Did a full reinstall of the controller from a Rocky 8.9 USB key , ran through the procedure again and changed nothing else. Yet again exactly the same issue ... I have not touched the compute-redhat.yml file in any way

Note: On a default Rocky8 install python3.6 is the system default. Ansible on Rocky8 now uses python3.11 but for python3.11 there is no python3.11-dnf package so adding python3.11 will break things elsewhere... I'm seriously wondering how this can work at all

javree avatar Feb 08 '24 09:02 javree

Just for giggles I tried the compute-ubuntu playbook and that completed fine, so I can at least boot a node soon hopefully... But the issue regarding ansible using python 3.11 vs dnf using python 3.6 remains when trying to build a RHEL image

javree avatar Feb 20 '24 15:02 javree

I've tried this again, but this time using Rocky Linux 9.3 on the controller and there all appears to work just fine.

javree avatar Feb 26 '24 13:02 javree

@javree thank you for your feedback! I think for the 8.x we had a fix but I'll need to double check that.

msteggink avatar Feb 26 '24 16:02 msteggink

There have been quite a few changes in how we prepare (install) the ansible environment before running the playbook. Though these have not been pushed to github yet, i expect (hope) that these issues will belong to the past. Our target for pushing is in about 2-3 weeks from today. We are finalizing the new monitoring stack and H/A.

aphmschonewille avatar Mar 16 '24 17:03 aphmschonewille

Latest greatest has been pushed.

aphmschonewille avatar Mar 31 '24 14:03 aphmschonewille

Very happy to report that with the new release all is well on Rocky 8 as well !

javree avatar Apr 02 '24 08:04 javree

Hate to reopen this ...

Did a fresh checkout, machine fully up to date Rocky 8.9 Running

marclus0 18:43:54 [root@marclus0 site]# ansible-playbook compute-redhat.yml

Gives me

TASK [trix-tree : Create Trinity H/A directory structure on controllers] **************************************************************************************************************************************** skipping: [compute.osimages.luna]

TASK [init : Install init packages] ***************************************************************************************************************************************************************************** failed: [compute.osimages.luna] (item=python3-libselinux) => {"ansible_loop_var": "item", "changed": false, "item": "python3-libselinux", "msg": "Could not import the dnf python module using /usr/libexec/platform-python (3.6.8 (default, Apr 24 2024, 21:55:04) [GCC 8.5.0 20210514 (Red Hat 8.5.0-22)]). Please install python3-dnf or python2-dnf package or ensure you have specified the correct ansible_python_interpreter. (attempted ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python2', '/usr/bin/python'])", "results": []}

PLAY RECAP ****************************************************************************************************************************************************************************************************** compute.osimages.luna : ok=3 changed=1 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0 controller1 : ok=59 changed=20 unreachable=0 failed=0 skipped=43 rescued=0 ignored=0

marclus0 18:55:33 [root@marclus0 site]#

Again the conflict between python 3.6 (system default) and the ansible python 3.11

javree avatar Jun 13 '24 17:06 javree

... one thing truly amazes me every time how something, supposedly be 'generic' like a Rocky install (or redhat, or alma, or...) can be so much different anywhere in the world.... I'll get back to you as rocky 8.10 (which i've done more than 10 installs today alone), all work as expected. Not sure if rocky 8.9 is now deviating? Last week 8.9 was also just fine... It truly amazes me..... -A

aphmschonewille avatar Jun 13 '24 17:06 aphmschonewille

a hint - as I encountered the same issue today. Verify your subscription within the image.

Also, within the image, run a watch -n 0.1 "cat /etc/yum.repos.d/redhat.repo" (Using Red Hat instead)

What happens within rhel 8.10 is in regards to the default baseurl within /etc/rhsm/rhsm.conf Somewhere down the line it starts to redirect to cdn.redhat.com within the redhat.repo instead of our own satellite server.

Changed the rhsm.conf baseurl to our own satellite server, and it 'stopped' changing to cnd.redhat.com resulting in installing the correct packages.

To test - try installing python3-libselinux manually within the the image, before and after 'fixing' the subscription

note: redhat.repo is configured correctly at task OK at TASK [trinity/image-create : Install redhat-release package in /trinity/images/compute] *******************************************************************************

But right before/during installing the external RPM packages tasks, redhat.repo gets 'overruled' by rhsm.conf

Second: please use python_interpreter=/usr/libexec/platform-python within ansible.cfg It resolves allot of issues with red hat at least. Including this issue (+ the above solution);

last note: the controller has a different range of supported Python interpreters than the targets and That's why you will also have problems on rhel8 if you use Ansible 2.17

(I've used ansible 2.15.x on the controller instead)

xdkreij avatar Jun 26 '24 20:06 xdkreij

Since 14.4 has been out for a while, where we have tested things to the extreme, is this still occuring?

-A

aphmschonewille avatar Feb 14 '25 20:02 aphmschonewille

Hi all..... Trix 15 is out and this ticket became stale. Can i close it and when needed reopen? -A

aphmschonewille avatar Apr 10 '25 20:04 aphmschonewille