paho.mqtt.python icon indicating copy to clipboard operation
paho.mqtt.python copied to clipboard

when client has been disconnected, but client.is_connected() is True

Open showfuture opened this issue 1 year ago • 10 comments

version: 1.6.1 python: 3.9.6

When I'm using paho-mqtt, sometimes in a weak network environment, there is an issue where the MQTT connection is actually disconnected, but according to client.is_connected(), the value is True, indicating that the client is still considered connected. I cannot rely on client.is_connected() to determine the connection status, which has led to significant bugs in my business logic. How can I solve this problem?

showfuture avatar May 13 '24 09:05 showfuture

Version: 1.6.1 and 2.1.0 The same problem exists, causing BUG.

JiajiaHuang avatar Jun 19 '24 07:06 JiajiaHuang

Unfortunately it's not going to be possible to help without significantly more information (code, logs etc). In some cases the status of .is_connected() will not change immediatly as the connection is half open meaning the loss of connection will only be picked up by the keepalive process.

MattBrittan avatar Jul 17 '24 23:07 MattBrittan

@MattBrittan I have a similar issue. Occacionally, my connection gets loss due e.g. bad internet connection. Thus, I understand connection is half-open. How long does the keepalive process require to finally update the statuts of the connection? Indeed, it is my first time that I actively need to maintain the connection. I thought on_disconnect callback would be the right place to do so, but the missing flag makes it hard to put the right reconnect stategy.

Maybe some help on this. I actually never really used the function as the loop usually does everyhting for me. Is there a best practice on this? Currently, it more or less directly jumps into the return because of the worng connection status.


    def _mqtt_on_disconnect_callback(
            self, mqttc, userdata, flags, reason_code, properties=None
    ):
        """
        Callback for disconnecting the client. It will try to reconnect to the
        broker if the reason code is not 0. If the client is not connected
        after the maximum number of retries, it will stop the client.
        For the manual reconnect strategy, the client will increase the
        reconnect interval exponentially (`^2`) until the maximum number
        of retries.

        Args:
            mqttc:
            userdata:
            flags:
            reason_code:
            properties:

        Returns:
        """
        if reason_code.value != 0:
            self.logger.error(
                "Disconnected with error code:  %s: %s", reason_code.value,
                reason_code.getName()
            )
            self.logger.info("Trying to reconnect during the next %s minutes",
                             self.settings.mqtt_reconnect_min_interval *
                             self.settings.mqtt_reconnect_max_retries)

            for _ in range(self.settings.mqtt_reconnect_max_retries):
                # This loop checks during the next x minutes if the client is
                # connected again. It will break if the client is connected,
                # otherwise it will trigger a manual reconnect strategy
                self.logger.info("Waiting for auto-reconnect...")
                if self._mqttc.is_connected():
                    break
                time.sleep(self.settings.mqtt_reconnect_min_interval)
            else:
                self.logger.error(
                    "Auto-reconnect failed after %s mins. "
                    "Stopping the loop and try to manually "
                    "reconnect",
                    self.settings.mqtt_reconnect_min_interval *
                    self.settings.mqtt_reconnect_max_retries
                )

                _reconnect_interval = self.settings.mqtt_reconnect_min_interval
                for _ in range(self.settings.mqtt_reconnect_max_retries):
                    # This loop tries to reconnect manually to the broker
                    if self._mqttc.is_connected():
                        break
                    try:
                        self.logger.info("Trying to manually reconnect...")
                        # self._mqttc.loop_stop()
                        self._mqttc.reconnect()
                        # self._mqttc.loop_start()
                        if self._mqttc.is_connected():
                            break
                    except Exception as err:
                        self.logger.error("Manual reconnect failed: %s", err)
                    # wait before next reconnect attempt
                    time.sleep(_reconnect_interval)
                    # increase the reconnect interval exponentially
                    _reconnect_interval *= 2
                else:
                    self.logger.error(
                        "Manual reconnect failed after mins. "
                        "Trying to restart client..."
                    )
                    self.logger.info("Stopping client...")
                    self.stop()
                    self.logger.info("Restarting client...")
                    self.start()
                    self.logger.info(
                        "Client restarted. "
                        "Waiting for connection acknowledge...")
                    time.sleep(self.settings.mqtt_reconnect_min_interval)

            if not self._mqttc.is_connected():
                self.logger.error("Reconnect failed. Stopping client...")
                self.stop()
            else:
                return
        else:
            logging.debug("Disconnected with reason code: %s: %s",
                          reason_code.value, reason_code.getName())
            self._mqttc.loop_stop()





tstorek avatar Mar 03 '25 21:03 tstorek

@tstorek entering a loop like this in a callback will probably not have the desired result, because the network loop will be blocked until your callback returns (and, thus, is_connected will not change). You can find the ping code here (at a glance it looks like it waits for the keepalive period for a response). The library should attempt to reconnect itself if there is no response to the ping (but note that I mainly code in Go and don't really use this library!).

MattBrittan avatar Mar 03 '25 22:03 MattBrittan

@MattBrittan I did a little test on function after the idea with the keepalive period and it seems this does the trick. The function needs to return once before is_connected will be updated. In the second run, my reconnect strategy is triggered. However, the strategy does not stop because it does not see if it succeed. However, manual recconnect at least raises an error, thus I will use that one.

Still, it is hard the figure out this whole thing. That thing should be improved e.g. by a threading-event or something. This way is_connected is not of a big use. At least not in this context. Probably I would need to implement my own flag.

BTW: is there a timeout for the auto reconnect?

tstorek avatar Mar 03 '25 22:03 tstorek

Still, it is hard the figure out this whole thing. That thing should be improved e.g. by a threading-event or something. T

I believe the preferred approach is for the library to manage the connection (and reestablish it if anything goes wrong) so that individual users don't need to be concerned about this (it's difficult to get right). This is what the library appears to do, it should pick up half-open connections via the keepalive process and reconnect automatically - if this is not working then it would be good to get logs so the issue can be resolved (as opposed to working around the problem).

BTW: is there a timeout for the auto reconnect?

My understanding is that, by default, it will attempt to reconnect indefinitly (loop_forever should really continue working forever). There are settings (reconnect_on_failure etc) that provide some control over this.

MattBrittan avatar Mar 03 '25 23:03 MattBrittan

@MattBrittan Thanks, again an example for KISS. Removing all the stuff and leave it to the library to manage it :) Probably that is the best practice. I just realised that the auto_reconnect also uses an exponential delay

tstorek avatar Mar 03 '25 23:03 tstorek