salt icon indicating copy to clipboard operation
salt copied to clipboard

[BUG] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc

Open BeehiveSystems opened this issue 1 year ago • 7 comments

Description Improper startup followed by Python errors.

Setup New Rocky Linux 9 master also running a minion on itself for testing. FirewallD disabled, SELinux set to permissive.

  • [X] on-prem machine
  • [X] VM (vSphere)
  • [ ] VM running on a cloud service, please be explicit and add details
  • [ ] container (Kubernetes, Docker, containerd, etc. please specify)
  • [ ] or a combination, please be explicit
  • [ ] jails if it is FreeBSD
  • [ ] classic packaging
  • [X] onedir packaging
  • [ ] used bootstrap to install

Steps to Reproduce the behavior Install 3007 onedir version for RHEL 9 following the instructions here. https://docs.saltproject.io/salt/install-guide/en/latest/topics/install-by-operating-system/rhel.html#install-salt-on-redhat-rhel-9-x86-64

Expected behavior Normal startup of the salt-master.

Screenshots See log output below.

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3007.0
 
Python Version:
        Python: 3.10.13 (main, Feb 19 2024, 03:31:20) [GCC 11.2.0]
 
Dependency Versions:
          cffi: 1.16.0
      cherrypy: unknown
      dateutil: 2.8.2
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.3
       libgit2: Not Installed
  looseversion: 1.3.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.7
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 23.1
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.19.1
        pygit2: Not Installed
  python-gnupg: 0.5.2
        PyYAML: 6.0.1
         PyZMQ: 25.1.2
        relenv: 0.15.1
         smmap: Not Installed
       timelib: 0.3.0
       Tornado: 6.3.3
           ZMQ: 4.3.4
 
Salt Package Information:
  Package Type: onedir
 
System Versions:
          dist: rocky 9.4 Blue Onyx
        locale: utf-8
       machine: x86_64
       release: 5.14.0-427.16.1.el9_4.x86_64
        system: Linux
       version: Rocky Linux 9.4 Blue Onyx

Additional context

2024-05-20 22:14:33,428 [salt.transport.tcp:311 ][WARNING ][24352] TCP Publish Client encountered an exception while connecting to /var/run/salt/master/master_event_pub.ipc: StreamClosedError('Stream is closed'), will reconnect in 1 seconds -   File "/usr/bin/salt-master", line 11, in <module>
    sys.exit(salt_master())

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/scripts.py", line 88, in salt_master
    master.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/cli/daemons.py", line 224, in start
    self.master.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 823, in start
    self.process_manager.add_process(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 530, in add_process
    process.start()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 1099, in start
    super().start()

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
    return Popen(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/popen_fork.py", line 71, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)

  File "/opt/saltstack/salt/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/process.py", line 994, in wrapped_run_func
    return run_func()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 995, in run
    with salt.utils.event.get_master_event(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 150, in get_master_event
    return MasterEvent(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 928, in __init__
    super().__init__(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 265, in __init__
    self.connect_pub()

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/event.py", line 348, in connect_pub
    self.subscriber = salt.transport.ipc_publish_client(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 210, in ipc_publish_client
    return publish_client(opts, io_loop, **kwargs)

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 152, in publish_client
    return salt.transport.tcp.PublishClient(

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/tcp.py", line 219, in __init__
    super().__init__(opts, io_loop, **kwargs)

  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/base.py", line 398, in __init__
    super().__init__()

2024-05-20 22:14:33,432 [salt.transport.tcp:1405][ERROR   ][24350] Publish server binding pub to /var/run/salt/master/master_event_pub.ipc ssl=None

BeehiveSystems avatar May 21 '24 02:05 BeehiveSystems

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey. Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar. If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

welcome[bot] avatar May 21 '24 02:05 welcome[bot]

Running into the same issue on a years old installation. Any ideas how to fix?

curry684 avatar Jun 18 '24 08:06 curry684

Running into the same issue on a years old installation. Any ideas how to fix?

What Salt master version?

I ended up using Salt from the Rocky repo (3005) instead of 3007 from the Salt repo.

BeehiveSystems avatar Jun 18 '24 08:06 BeehiveSystems

Mmm a reboot did some of the job. I found a bunch of reports elsewhere that 3007 changed the permission model to run more salt components in userland instead of as root, which more or less leaves /var/run/salt in a broken state until a full restart.

Minion still broken on the master though, looking into that now.

curry684 avatar Jun 18 '24 08:06 curry684

Minion is still spamming tons of errors but resumed working on its own shortly after reboot:

2024-06-18 08:50:05,586 [salt.transport.zeromq:396 ][ERROR   ][776] Exception while running callback
Traceback (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/transport/zeromq.py", line 394, in consume
    await callback(msg)
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/channel/client.py", line 484, in wrap_callback
    await callback(decoded)
TypeError: object NoneType can't be used in 'await' expression

curry684 avatar Jun 18 '24 08:06 curry684

I can confirm this is still an issue on Ubuntu 22.04 and 3007.1. I originally submitted this bug under CentOS in my home environment but ran into the issue at work as well on a clean server.

I attempted fixing it with:

sudo rm -f /var/run/salt/master/master_event_pub.ipc sudo systemctl restart salt-master

But that did not work. Like you said, a full reboot brought the service back to a healthy state.

BeehiveSystems avatar Jun 20 '24 01:06 BeehiveSystems

I'm seeing the same issue. Both salt server and minion are on v3007.0 on Rocky Linux 8. Rebooting both didn't clear messages.

mshields-frtservices avatar Jun 20 '24 15:06 mshields-frtservices

For anyone interested we had the same problem and solved it with (for better or worse) with this: https://github.com/saltstack/salt/issues/66228#issuecomment-2247358708

damienjbyrne avatar Jul 24 '24 09:07 damienjbyrne