salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] Hourly presence change events received on master

Open gbunt opened this issue 2 years ago • 5 comments

Description We're currently running into the following with v3006.1: Regardless of any actual changes, the master receives presence/change every hour for all minions. This is not the behavior we had in v3002, and presence change events in our environment can trigger loads of orchestrate states to run, and all at once now.

Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

  • [ ] on-prem machine
  • [ ] VM (Virtualbox, KVM, etc. please specify)
  • [ ] VM running on a cloud service, please be explicit and add details
  • [x] container (Kubernetes, Docker, containerd, etc. please specify) LXD
  • [ ] or a combination, please be explicit
  • [ ] jails if it is FreeBSD
  • [ ] classic packaging
  • [x] onedir packaging
  • [ ] used bootstrap to install

Steps to Reproduce the behavior debug logging:

2023-06-18 04:04:43,670 [salt.utils.event :823 ][DEBUG   ][22091] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T02:04:43.670825'}
2023-06-18 05:05:16,638 [salt.utils.event :823 ][DEBUG   ][24066] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T03:05:16.638313'}
2023-06-18 06:05:49,553 [salt.utils.event :823 ][DEBUG   ][25224] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T04:05:49.553452'}
2023-06-18 07:06:22,526 [salt.utils.event :823 ][DEBUG   ][26719] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T05:06:22.526803'}
2023-06-18 08:06:55,540 [salt.utils.event :823 ][DEBUG   ][28012] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T06:06:55.540804'}
2023-06-18 09:07:28,490 [salt.utils.event :823 ][DEBUG   ][29283] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T07:07:28.490620'}
2023-06-18 10:08:01,368 [salt.utils.event :823 ][DEBUG   ][30765] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T08:08:01.368235'}
2023-06-18 11:08:34,233 [salt.utils.event :823 ][DEBUG   ][31988] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T09:08:34.233706'}
2023-06-18 12:09:07,234 [salt.utils.event :823 ][DEBUG   ][33340] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T10:09:07.234497'}
2023-06-18 13:09:40,190 [salt.utils.event :823 ][DEBUG   ][34530] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T11:09:40.190707'}
2023-06-18 14:10:13,058 [salt.utils.event :823 ][DEBUG   ][36180] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T12:10:13.058599'}
2023-06-18 15:10:46,088 [salt.utils.event :823 ][DEBUG   ][37798] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T13:10:46.088279'}
2023-06-18 16:11:19,078 [salt.utils.event :823 ][DEBUG   ][39355] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T14:11:19.078218'}
2023-06-18 17:11:52,016 [salt.utils.event :823 ][DEBUG   ][40622] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T15:11:52.016171'}
2023-06-18 18:12:24,950 [salt.utils.event :823 ][DEBUG   ][42013] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T16:12:24.950058'}
2023-06-18 19:12:57,931 [salt.utils.event :823 ][DEBUG   ][43556] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T17:12:57.931541'}
2023-06-18 20:13:30,817 [salt.utils.event :823 ][DEBUG   ][44718] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T18:13:30.816871'}
2023-06-18 21:14:03,799 [salt.utils.event :823 ][DEBUG   ][46482] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T19:14:03.799817'}
2023-06-18 22:14:46,898 [salt.utils.event :823 ][DEBUG   ][47974] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T20:14:46.898101'}
2023-06-18 23:15:19,799 [salt.utils.event :823 ][DEBUG   ][49405] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T21:15:19.799625'}
2023-06-19 00:15:52,709 [salt.utils.event :823 ][DEBUG   ][50522] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T22:15:52.709098'}
2023-06-19 01:16:25,736 [salt.utils.event :823 ][DEBUG   ][52154] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-18T23:16:25.736522'}
2023-06-19 02:16:58,649 [salt.utils.event :823 ][DEBUG   ][53452] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T00:16:58.649243'}
2023-06-19 03:17:31,533 [salt.utils.event :823 ][DEBUG   ][54792] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T01:17:31.533062'}
2023-06-19 04:18:04,528 [salt.utils.event :823 ][DEBUG   ][56938] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T02:18:04.528774'}
2023-06-19 05:18:37,484 [salt.utils.event :823 ][DEBUG   ][58051] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T03:18:37.484359'}
2023-06-19 06:19:10,401 [salt.utils.event :823 ][DEBUG   ][59426] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T04:19:10.401127'}
2023-06-19 07:19:43,267 [salt.utils.event :823 ][DEBUG   ][60559] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T05:19:43.267281'}
2023-06-19 08:20:16,253 [salt.utils.event :823 ][DEBUG   ][61818] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T06:20:16.253843'}
2023-06-19 09:20:49,215 [salt.utils.event :823 ][DEBUG   ][14118] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T07:20:49.215430'}
2023-06-19 10:21:32,676 [salt.utils.event :823 ][DEBUG   ][22258] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T08:21:32.676621'}
2023-06-19 11:22:05,217 [salt.utils.event :823 ][DEBUG   ][25328] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T09:22:05.217491'}
2023-06-19 12:22:38,102 [salt.utils.event :823 ][DEBUG   ][31828] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T10:22:38.102539'}
2023-06-19 13:23:11,098 [salt.utils.event :823 ][DEBUG   ][37177] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T11:23:11.097996'}
2023-06-19 14:23:44,048 [salt.utils.event :823 ][DEBUG   ][38463] Sending event: tag = salt/presence/change; data = {'new': ['d201d0cd-8883-475f-9b57-883336202e46', '8862d39d-77b0-4b7d-9d7c-e47a7bd00ca5', '44da1045-eb8a-4264-b0ab-d4953a8730ca'], 'lost': [], '_stamp': '2023-06-19T12:23:44.048204'}

Relevant master config:

presence_events: True

tcp_keepalive: True
tcp_keepalive_idle: 10
tcp_keepalive_cnt: 3
tcp_keepalive_intvl: 10

Expected behavior presence/change events are only fired when minions are actually disconnected / gone or initiate a new TCP connection

Screenshots If applicable, add screenshots to help explain your problem.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3006.1
 
Python Version:
        Python: 3.10.11 (main, May  5 2023, 02:31:54) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 5.4.1
         PyZMQ: 23.2.0
        relenv: 0.12.3
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4
 
System Versions:
          dist: centos 7.6.1810 Core
        locale: utf-8
       machine: x86_64
       release: 5.15.0-56-generic
        system: Linux
       version: CentOS Linux 7.6.1810 Core

Additional context Add any other context about the problem here.

gbunt avatar Jun 19 '23 12:06 gbunt

Possibly caused by #63941

OrangeDog avatar Jun 20 '23 09:06 OrangeDog

Possibly caused by https://github.com/saltstack/salt/pull/63941

Agreed! Possible fix at https://github.com/saltstack/salt/pull/64508

raddessi avatar Jun 20 '23 17:06 raddessi

We have the same issue here: Salt Minion Versions: Salt Version: Salt: 3006.3

Python Version: Python: 3.10.13 (main, Sep 6 2023, 02:11:27) [GCC 11.2.0]

Dependency Versions: cffi: 1.14.6 cherrypy: 18.6.1 dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.2 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.9.8 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.13.10 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4

System Versions: dist: ubuntu 22.04.4 jammy locale: utf-8 machine: x86_64 release: 5.15.0-97-generic system: Linux version: Ubuntu 22.04.4 jammy

Salt Master Version:

Salt Version: Salt: 3006.7

Python Version: Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]

Dependency Versions: cffi: 1.14.6 cherrypy: unknown dateutil: 2.8.1 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.0.2 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.2 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 22.0 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.4.8 PyYAML: 6.0.1 PyZMQ: 23.2.0 relenv: 0.15.1 smmap: Not Installed timelib: 0.2.4 Tornado: 4.5.3 ZMQ: 4.3.4

System Versions: dist: ubuntu 22.04.4 jammy locale: utf-8 machine: x86_64 release: 5.15.0-97-generic system: Linux version: Ubuntu 22.04.4 jammy

About every hour all minions marked as "new" within event "/presence/changed" although there was no "lost" event before.

We are using a python script as systemd service to alert us in case of presence change:

import logging import salt.config import salt.utils.event from create_opsgenie_alert import create_opsgenie_alerts

logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%d.%m.%Y %H:%M:%S ', level=logging.INFO)

opts = salt.config.client_config("/etc/salt/master/")

event = salt.utils.event.get_event("master", sock_dir=opts["sock_dir"], opts=opts)

for data in event.iter_events(tag="salt/presence/change"): logging.info("TAG = salt/presence/change") logging.info("Data:\n" + data + "\n") logging.info(data.get('new', '--')) logging.info(data.get('lost', '--')) # print(data.get('_stamp')) alertmsg = "Salt Minion Presence State changed..." message = "Minion presence changed:\n new: {},\n lost: {}".format(data.get('new'), data.get('lost')) logging.info(message) # fire opsgenie alert create_opsgenie_alerts(alertmsg,'Message:\n {}'.format(message), "Salt Minion State Change Alert","P3")

stephanwiechert avatar Mar 12 '24 07:03 stephanwiechert

Apologies, I need to update the tests for the fix in https://github.com/saltstack/salt/pull/64508 still then it should be ready to merge. My evenings will be busy for a while so I can't say when I will have time to do this

raddessi avatar Apr 04 '24 17:04 raddessi

Any news here? We handling every day 24 alerts because of this issue because we are checking the salt minion status with that event. It would be nice to have a workaround until we have a fix, if there is one.

stephanwiechert avatar May 03 '24 07:05 stephanwiechert

Sorry, I have an infant taking up all my spare time :) There is only one test case needed to be added if I recall. I can write down what it is if you want to take a stab at it. It's possible it will require a logic change but I don't recall honestly, it's been to long

raddessi avatar Jun 05 '24 18:06 raddessi

You can always patch your master with the change from https://github.com/saltstack/salt/pull/64508, that is what I have done locally FWIW

raddessi avatar Jun 05 '24 18:06 raddessi