[BUG] Beacon service event is sent on highstate
Description
While testing v3005, upgraded from v3002.2, we found that when emitatstartup is True (the default), the beacon does not just fire when a salt-minion is restarted, but also during:
- a
saltutil.refresh_pillar - Highstate
We heavily rely on beacons to fire only during very specific events, and have worked around [1] in previous versions. However, [2] causes a beacon to trigger a reactor which ultimately, for one of the service beaons anyway, causes a highstate to run. In v3005 this now results in an endless loop.
Setup Setup is a test environment of a single LXD container running salt-master and salt-minion.
Please be as specific as possible and give set-up details.
- [ ] on-prem machine
- [ ] VM (Virtualbox, KVM, etc. please specify)
- [ ] VM running on a cloud service, please be explicit and add details
- [x] container (Kubernetes, Docker, containerd, etc. please specify) LXD 5.4
- [ ] or a combination, please be explicit
- [ ] jails if it is FreeBSD
- [ ] classic packaging
- [ ] onedir packaging
- [ ] used bootstrap to install
Steps to Reproduce the behavior
- Beacon config:
d1624691-10ed-4d35-b66d-0c4bfa00b4ed:
----------
beacons:
----------
service:
|_
----------
services:
----------
rngd:
----------
delay:
1
emitatstartup:
False
onchangeonly:
True
sshd:
----------
delay:
1
onchangeonly:
True
|_
----------
interval:
5
|_
----------
disable_during_state_run:
True
- Running pillar refresh:
$ date; salt '*' saltutil.refresh_pillar
Mon Sep 19 17:30:56 CEST 2022
d1624691-10ed-4d35-b66d-0c4bfa00b4ed:
True
- Results in only the service that not had
emitatstartupset toFalse, to fire an event to master (expected):
minion/refresh/d1624691-10ed-4d35-b66d-0c4bfa00b4ed {"Minion data cache refresh": "d1624691-10ed-4d35-b66d-0c4bfa00b4ed", "_stamp": "2022-09-19T15:31:47.747036"}
salt/beacon/d1624691-10ed-4d35-b66d-0c4bfa00b4ed/service/sshd {"sshd": {"running": true}, "service_name": "sshd", "id": "d1624691-10ed-4d35-b66d-0c4bfa00b4ed", "_stamp": "2022-09-19T15:31:58.395365"}
- Issue highstate:
$ date; salt '*' state.apply --output-diff
Mon Sep 19 17:33:01 CEST 2022
- Results in beacon sending state change event to master after highstate:
minion/refresh/d1624691-10ed-4d35-b66d-0c4bfa00b4ed {"Minion data cache refresh": "d1624691-10ed-4d35-b66d-0c4bfa00b4ed", "_stamp": "2022-09-19T15:33:04.909441"}
salt/job/20220919153302772294/ret/d1624691-10ed-4d35-b66d-0c4bfa00b4ed {"cmd": "_return", "id": "d1624691-10ed-4d35-b66d-0c4bfa00b4ed", "success": true, "return": { ... }}, "retcode": 0, "jid": "20220919153302772294", "fun": "state.apply", "fun_args": [], "out": "highstate", "_stamp": "2022-09-19T15:33:26.214007"}
salt/beacon/d1624691-10ed-4d35-b66d-0c4bfa00b4ed/service/sshd {"sshd": {"running": true}, "service_name": "sshd", "id": "d1624691-10ed-4d35-b66d-0c4bfa00b4ed", "_stamp": "2022-09-19T15:33:32.411391"}
Expected behavior Beacons don't trigger on highstate. This seems to be a regression, at least when comparing behavior with 3002.2 and earlier 2019 versions.
Versions Report
salt --versions-report
Salt Version: Salt: 3005Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: Not Installed docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 2.8.1 libgit2: Not Installed M2Crypto: 0.33.0 Mako: Not Installed msgpack: 0.6.2 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: Not Installed pygit2: Not Installed Python: 3.6.8 (default, Nov 16 2020, 16:55:22) python-gnupg: Not Installed PyYAML: 3.11 PyZMQ: 18.0.1 smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.1.4
System Versions: dist: centos 7 Core locale: UTF-8 machine: x86_64 release: 5.15.0-46-generic system: Linux version: CentOS Linux 7 Core
Additional context
Related: #62675
Are you able to test the fix for that and see if it makes a difference?
Are you able to test the fix for that and see if it makes a difference?
Should've mentioned, the above results are already v3005 with the patch from PR https://github.com/saltstack/salt/pull/62682. Behavior of v3005 without the patch is the same.
Fwiw, we're currently running v3002.2 and when running state.apply we do see:
minion/refresh/daee17ea-5b61-4d80-8d01-6171036d5e65 {"Minion data cache refresh": "daee17ea-5b61-4d80-8d01-6171036d5e65", "_stamp": "2022-09-27T07:48:47.669963"}
But we don't see beacons that have emitatstartup and onchangeonly set to True, being triggered. Whereas with v3005 when we issue a state.apply we do see the beacon sending a change event:
minion/refresh/daee17ea-5b61-4d80-8d01-6171036d5e65 {"Minion data cache refresh": "daee17ea-5b61-4d80-8d01-6171036d5e65", "_stamp": "2022-09-27T07:41:19.491791"}
salt/beacon/daee17ea-5b61-4d80-8d01-6171036d5e65/service/sshd {"sshd": {"running": true}, "service_name": "sshd", "id": "daee17ea-5b61-4d80-8d01-6171036d5e65", "_stamp": "2022-09-27T07:41:25.939968"}
In both versions, we see saltutil.refresh_pillar resulting in beacons getting their state reset. Not ideal either but as mentioned, we have worked around that for now. However, a highstate resetting a beacon state which triggers a highstate from a reactor is a showstopper for us. We also depend on a few other recent bugfixes around beacons so we're getting severely stuck here. Is this bug scheduled for a specific version and anything to share on a potential release date?
As a follow up, we're heavily testing v3006.1 now and i can't seem to reproduce the beacons events being sent on highstate, nor on pillar refresh.
However, we're still / now seeing that service beacons that do not have emitatstartup set to False (ie. True, the default) are more or less triggered at random. This is logged while no service changes are happening at all:
2023-06-15 16:40:10,744 [salt.utils.event :823 ][DEBUG ][60199] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:40:10.744262'}
2023-06-15 16:41:27,740 [salt.utils.event :823 ][DEBUG ][60186] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:41:27.740434'}
2023-06-15 16:42:44,732 [salt.utils.event :823 ][DEBUG ][60201] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:42:44.732792'}
2023-06-15 16:44:01,743 [salt.utils.event :823 ][DEBUG ][60199] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:44:01.743347'}
2023-06-15 16:45:18,747 [salt.utils.event :823 ][DEBUG ][60186] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:45:18.746930'}
2023-06-15 16:46:35,747 [salt.utils.event :823 ][DEBUG ][60199] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:46:35.747352'}
2023-06-15 16:47:52,749 [salt.utils.event :823 ][DEBUG ][60199] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:47:52.749045'}
2023-06-15 16:49:09,735 [salt.utils.event :823 ][DEBUG ][60186] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:49:09.734951'}
2023-06-15 16:50:26,744 [salt.utils.event :823 ][DEBUG ][60199] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:50:26.743985'}
2023-06-15 16:51:43,769 [salt.utils.event :823 ][DEBUG ][60186] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:51:43.769658'}
2023-06-15 16:53:00,743 [salt.utils.event :823 ][DEBUG ][60197] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:53:00.743301'}
2023-06-15 16:54:18,753 [salt.utils.event :823 ][DEBUG ][60184] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:54:18.753348'}
2023-06-15 16:55:35,750 [salt.utils.event :823 ][DEBUG ][60197] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-15T14:55:35.750024'}
ie. the event is sent by the minion about every minute. We also have some other service beacons with emitatstartup set to False. The behavior on those seems fine. For the minion though, we can't disable emitatstartup because we rely on that to run an orchestrate whenever the node boots (and thus, the minion starts).
Given grains_refresh_every now defaults to 1, i've also tried to see if bigger values, or disabling it (0) had any effect on the above. Fwiw, it did not.
salt --versions-report
salt --versions-report Salt Version: Salt: 3006.1Python Version: Python: 3.10.11 (main, May 5 2023, 02:31:54) [GCC 11.2.0]
Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 5.4.1 PyZMQ: 23.2.0 relenv: 0.12.3 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4
System Versions: dist: centos 7.6.1810 Core locale: utf-8 machine: x86_64 release: 5.15.0-56-generic system: Linux version: CentOS Linux 7.6.1810 Core
Sorry for the confusion if any, but the root cause is actually still just the fact that the event is sent on every highstate and pillar refresh. After testing with @whytewolf yesterday where he wasn't seeing the issue with the exact same beacon configuration, i looked at the orchestration that's triggered by the service event, and that triggers a highstate run:
state_apply:
salt.state:
- tgt: {{ target }}
- tgt_type: list
- queue: True
- highstate: True
Currently testing with beacons defined in minion config file /etc/salt/minion.d/beacons.conf:
beacons:
service:
- services:
salt-minion:
delay: 10
onchangeonly: True
emitatstartup: True
- interval: 5
- disable_during_state_run: False
So the above:
- Sends an event to master 10s after the minion started
- Triggers a reactor, which then calls the orchestrate that runs the highstate (This ensures the minion node is up-to-date, eg. after having been down for a period of time)
- Creates an endless loop
Verified this by disabling all reactors and running a highstate manually:
$ date; salt $( cat /etc/salt/minion_id ) state.highstate &>/dev/null; date
Fri Jun 16 11:58:38 CEST 2023
Fri Jun 16 11:59:12 CEST 2023
This consistently produces:
2023-06-16 11:59:02,759 [salt.utils.event :823 ][DEBUG ][21548] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-16T09:59:02.759010'}
Similar with a pillar refresh:
$ date; salt $( cat /etc/salt/minion_id ) saltutil.refresh_pillar &>/dev/null; date
Fri Jun 16 12:01:24 CEST 2023
Fri Jun 16 12:01:25 CEST 2023
Produces:
2023-06-16 12:01:47,754 [salt.utils.event :823 ][DEBUG ][21550] Sending event: tag = salt/beacon/3deb8cd1-79cc-483e-a3df-929ba58f9b9f/service/salt-minion; data = {'salt-minion': {'running': True}, 'service_name': 'salt-minion', 'id': '3deb8cd1-79cc-483e-a3df-929ba58f9b9f', '_stamp': '2023-06-16T10:01:47.753924'}
Fwiw, running any of our states separately using state.sls <state> does not trigger an event.