[BUG] 3007.X scheduler_before_connect throws TCP Publish Client encountered an exception, no start event is seen
Description
3007.1 Windows minions with option scheduler_before_connect: True set throws exception TCP Publish Client encountered an exception while connecting to 127.0.0.1:4510: TimeoutError() on start, and does not generate /start event on master event bus
[TRACE ] IPCServer: Handling connection to address: ('127.0.0.1', 52006)
[DEBUG ] The functions from module 'mine' are being loaded by dir() on the loaded module
[DEBUG ] LazyLoaded mine.update
[DEBUG ] schedule.handle_func: adding this job to the jobcache with data {'id': 'minion-win-1', 'fun': 'mine.update', 'fun_args': [], 'schedule': '__mine_interval', 'jid': '20240927221656953118', 'pid': 420}
[TRACE ] ==== evaluating schedule now None =====
[TRACE ] Client disconnected from IPC 127.0.0.1:4511
[WARNING ] TCP Publish Client encountered an exception while connecting to 127.0.0.1:4510: TimeoutError(), will reconnect in 1 seconds - File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Salt Project\Salt\Lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Program Files\Salt Project\Salt\salt-minion.exe\__main__.py", line 7, in <module>
sys.exit(salt_minion())
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\scripts.py", line 185, in salt_minion
minion.start()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 344, in start
self._real_start()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\cli\daemons.py", line 356, in _real_start
self.minion.tune_in()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1193, in tune_in
self._bind()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\minion.py", line 1064, in _bind
self.event = salt.utils.event.get_event(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 135, in get_event
return SaltEvent(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 265, in __init__
self.connect_pub()
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\utils\event.py", line 348, in connect_pub
self.subscriber = salt.transport.ipc_publish_client(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 210, in ipc_publish_client
return publish_client(opts, io_loop, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 152, in publish_client
return salt.transport.tcp.PublishClient(
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\tcp.py", line 220, in __init__
super().__init__(opts, io_loop, **kwargs)
File "C:\Program Files\Salt Project\Salt\Lib\site-packages\salt\transport\base.py", line 398, in __init__
super().__init__()
[TRACE ] TCP PubServer sending payload: topic_list=None '/salt/minion/minion_schedule_delete_complete\n\n��completeèschedule��__mine_interval��enabledèfunction�mine.update�jid_includeêmaxrunning\x02�minutes<�name�__mine_interval�return_job£runìrun_on_startåsplay��_stamp�2024-09-27T22:16:56.937494'
[TRACE ] TCP PubServer finished publishing payload
[DEBUG ] Decrypting the current master AES key
[DEBUG ] salt.crypt.get_rsa_key: Loading private key
[DEBUG ] Loaded minion key: C:\ProgramData\Salt Project\Salt\conf\pki\minion\minion.pem
No /start event is shown, only auth
salt/auth {
"_stamp": "2024-09-27T22:22:51.680278",
"act": "accept",
"id": "minion-win-1",
"pub": "-----BEGIN PUBLIC KEY-----REDACTED-----END PUBLIC KEY-----",
"result": true
}
minion/refresh/minion-win-1 {
"Minion data cache refresh": "minion-win-1",
"_stamp": "2024-09-27T22:22:51.961685"
}
The minion seems responsive after start, but not having the /start event seems to be a big issue.
The scheduler_before_connect option is required in this use case and works fine in 3006.x versions.
Setup
3007.x Windos minion with scheduler_before_connect: True
Expected behavior
Minion should start without the exception, the scheduler_before_connect should work as expected and /start event should be seeing on the event bus
I discovered a little difference in behavior, if the minion is configured in multimaster (active/active), when the minion is started as service, net start salt-minion, there is no /start event.
But when the minion is started in foreground, salt-minion -l debug, I can randomly get a /start event in one or the other master, but not in both.
With the scheduler_before_connect option removed, both masters get the /start event
The problem seems to be async/await functions not being called properly, something we've seen recently in another minion connection related issues.
@dwoz @twangboy I would like to ask your review on this core changes due the complexities of this part of the code and general minion connection process, I'm not sure if this are how you want thigs to work for the minion connection setup.
The problem seems to be in the
# Main Minion Tune In
def tune_in(self, start=True):
HERE
Inside this function, the function self.setup_scheduler() is called which triggers the exception.
I simply changed the setup_scheduler() function to be async async def setup_scheduler(self, before_connect=False):. Just with this change the exception is not longer thrown.
I also changed the call to this function to use await: await self.setup_scheduler().
In addition, (I will mention this in a new issue), I've observed that this section of the code inside the #Main minion tune in def tune_in(self, start=True):
HERE
which is suppossed to setup_beacons and setup_scheduler when the options beacons_before_connect or scheduler_before_connect are true, does not seem to be ever triggered during the minion start up.
I added some logging messages and never seem to be called with start: True. There's another function async def _connect_minion(self, minion) from which minion.tune_in() is called with start=False.
if start:
if self.opts.get("beacons_before_connect", False):
self.setup_beacons(before_connect=True)
if self.opts.get("scheduler_before_connect", False):
self.setup_scheduler(before_connect=True)
I tried this with multimaster setup, two masters, the tune_in() function seems to be executed once per master connected.
@barneysowood fyi
Can someone please tell me where exactly I have to enter / set scheduler_before_connect: false to get the windows client working again? On the client or on the server? In which file exactly?
Can someone please tell me where exactly I have to enter / set scheduler_before_connect: false to get the windows client working again? On the client or on the server? In which file exactly?
On the minion conf, but it has to be set true, scheduler_before_connect: true
Now I'm confused. The error is triggered because it is set to true by default? That's why I want to turn it off so that the error no longer occurs under Windows?
Does this then have to be entered in minon.conf under program files/saltstack/... must this be entered?
Sorry I misread your comment and thought you wanted to enable this option.
By default, in a fresh minion installation the option is not present, so the minion code will use false as default value, so you shouldn't do anyting else.
In my case, as stated in the issue original description, I had this option set to true becaue I'm explicitly using this option to enable the scheduler run jobs even if the minion has not a connection established to its master.
If you want to explicitly set it, either to true or false, you can tweak the minion configuration file or add a new conf file under minion.d/*.conf