salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] minion crashes when package not managed by systemctl

Open dnessett opened this issue 1 year ago • 6 comments

Description When using the sls code given in the Setup section and the command also given in the Setup section, the identified minion crashes. This may have something to do with the package systemd, which is not managed by systemctl. (Ignore the title "restart_minion", since the code was taken and then modified to work on packages other than salt-minion without changing the title of the section).

Setup

# update-package.sls
# Check the version of a package and patch to the latest version.
# Sample command: salt <minionid> state.sls patch_pkg.sls pillar='{"package": "systemd"}' test=1
# running the above with "test=1" allows you to see if an update is needed for the package before actually updating it.
{% set package = salt['pillar.get']('package') %}

upgrade_{{ package }}:
  pkg.latest:
    - name: {{ package }}

restart_minion:
  service.running:
    - name: {{ package }}
    - watch:
      - upgrade_{{ package }}

and the following command:

sudo salt --timeout=600 'MOLS-H-03' state.sls update-package pillar='{"package": "systemd"}'

The minion does not return and the command times out. When checking the status of the minion on its machine, it seems the service.running code crashes the minion. However, when I run the command a second time, the minion does not crash.

Output from checking the status of the minion after running the command for the first time:

dnessett@MOLS-H-03:~$ sudo systemctl status salt-minion
[sudo] password for dnessett:             
× salt-minion.service - The Salt Minion
     Loaded: loaded (/lib/systemd/system/salt-minion.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2024-08-21 13:07:09 MDT; 16min ago
       Docs: man:salt-minion(1)
             file:///usr/share/doc/salt/html/contents.html
             https://docs.saltproject.io/en/latest/contents.html
    Process: 4795 ExecStart=/usr/bin/salt-minion (code=exited, status=1/FAILURE)
   Main PID: 4795 (code=exited, status=1/FAILURE)
        CPU: 943ms

Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] stderr: Running scope as unit: run-re2d3f7e5b7914a1dae1cae8845f>
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: Failed to stop systemd.service: Unit systemd.service not loaded.
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] retcode: 5
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] Command '/usr/bin/systemd-run' failed with return code: 5
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] stderr: Running scope as unit: run-r2ffa949b7afe4383b510e40ada9>
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: Failed to start systemd.service: Unit systemd.service not found.
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] retcode: 5
Aug 21 13:07:08 MOLS-H-03 salt-minion[4054]: [ERROR   ] Failed to start systemd.service: Unit systemd.service not found.
Aug 21 13:07:09 MOLS-H-03 systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 13:07:09 MOLS-H-03 systemd[1]: salt-minion.service: Failed with result 'exit-code'.

Output on the master when running the command a second time, after restarting the minion using systemctl:

dnessett@homelserv:~$ sudo salt --timeout=120 'MOLS-H-03' state.sls update-package pillar='{"package": "systemd"}'
MOLS-H-03:
----------
          ID: upgrade_systemd
    Function: pkg.latest
        Name: systemd
      Result: True
     Comment: Package systemd is already up-to-date
     Started: 14:40:08.506796
    Duration: 3657.397 ms
     Changes:   
----------
          ID: restart_minion
    Function: service.running
        Name: systemd
      Result: False
     Comment: The named service systemd is not available
     Started: 14:40:12.166861
    Duration: 14.847 ms
     Changes:   

Summary for MOLS-H-03
------------
Succeeded: 1
Failed:    1
------------
Total states run:     2
Total run time:   3.672 s
ERROR: Minions returned with non-zero exit code

dnessett@homelserv:~$ sudo salt 'MOLS-H-03' test.ping
MOLS-H-03:
    True

This bug is hard to replicate, since once the command completes the first time, the systemd package is updated. Running the same command the second time does not crash the minion. Checking the status of the minion after running the command a second time shows it is up and active.

  • [X ] on-prem machine
  • [ ] VM (Virtualbox, KVM, etc. please specify)
  • [ ] VM running on a cloud service, please be explicit and add details
  • [ ] container (Kubernetes, Docker, containerd, etc. please specify)
  • [ ] or a combination, please be explicit
  • [ ] jails if it is FreeBSD
  • [ ] classic packaging
  • [ X] onedir packaging
  • [ ] used bootstrap to install

Steps to Reproduce the behavior It is necessary that the minion machine is running a version of systemd that is not the latest. One way to replicate (although I have not tried this) is to take a timeshift backup, run the update code, then restore from the backup.

Expected behavior I expected systemd to be updated and the state update to return with success. There is an easy way to work around this problem: Make the update specific to systemd and get rid of the section labeled restart_minion:. I haven't tried this workaround, but expect it would work.

Screenshots No Screenshots are relevant.

Versions Report

Master:

dnessett@homelserv:~$ sudo salt --versions-report
[sudo] password for dnessett:        
Salt Version:
          Salt: 3006.9
 
Python Version:
        Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.4
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.17.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: linuxmint 21.3 virginia
        locale: utf-8
       machine: x86_64
       release: 6.5.0-44-generic
        system: Linux
       version: Linux Mint 21.3 virginia

Minion:

dnessett@homelserv:~$ sudo salt-run 'MOLS-H-03' --versions-report
Salt Version:
          Salt: 3006.9
 
Python Version:
        Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
  cryptography: 42.0.5
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.4
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.17.0
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: linuxmint 21.3 virginia
        locale: utf-8
       machine: x86_64
       release: 6.5.0-44-generic
        system: Linux
       version: Linux Mint 21.3 virginia
N/A

Additional context Add any other context about the problem here.

dnessett avatar Aug 21 '24 21:08 dnessett

I got this salt code from @dmurphy18 in the discussion of issue #55103. However, this use of the code does not restart the minion (as mentioned above the "restart minion" labeling is a fossil that has nothing to do with how the code is used). The package being updated is systemd, not salt-minion. Even if the package was salt-minion, it should not crash the targeted minion.

dnessett avatar Aug 22 '24 15:08 dnessett

I have tested out the workaround mentioned above.

cat update-systemd.sls
# update-systemd.sls
# Check the version of systemd and patch to the latest version.
# Sample command: salt <minionid> state.sls update-systemd test=1
# running the above with "test=1" allows you to see if an update is needed for systemd before actually updating it.

upgrade_systemd:
  pkg.latest:
    - name: systemd

It works and systemd is restarted after the update.

dnessett avatar Aug 26 '24 15:08 dnessett

Your original code assumes that every package has a service that's the same name as the package. That is not true.

There's no bug here. You're just telling it to do something that is wrong.

OrangeDog avatar Aug 26 '24 17:08 OrangeDog

I would suggest there is a bug. Incorrect SALT code should not crash the minion. I know virtually nothing about the internals of SALT, but a wild guess is service.running is assuming systemd can be started/stopped/statused by systemctl. This assumption should first be checked by the implementation of service.running and an error thrown when the package does not satisfy it. Why this crashes the minion is a question that I cannot answer.

dnessett avatar Aug 26 '24 17:08 dnessett

Can you actually reproduce it crashing the minion? The first time is a second after the failure to restart a non-existent service.

OrangeDog avatar Aug 26 '24 17:08 OrangeDog

Well, as I mention in the original post, it is hard to reproduce this bug, since once systemd is updated to its current version, the minion does not crash. It returns:

          ID: restart_minion
    Function: service.running
        Name: systemd
      Result: False
     Comment: The named service systemd is not available
     Started: 14:40:12.166861
    Duration: 14.847 ms
     Changes:   

This suggests that it is possible to catch the error when service.running is called after the package is updated. In order for me to reproduce the error I would have to back out the update and return systemd to its previous version (249.11-0ubuntu3.11). I am not an expert on apt and so don't really know how to do this. If someone can suggest how to downgrade a package to a previous version, I would be willing to try the failing upgrade on my development system.

dnessett avatar Aug 26 '24 17:08 dnessett