paho.mqtt.python icon indicating copy to clipboard operation
paho.mqtt.python copied to clipboard

Suggestion: pedantic string encoding management

Open cladmi opened this issue 9 years ago • 1 comments

In the same way as python3 removed automatic conversion from string to bytes, I would like to have a way to prevent publish/will_set from auto converting payloads. My problem started with paho encoding python2 bytes to utf-8 event if it should not (I saw the PR to fix it), and I then tried to find if encoding was well managed in my application.

Now in my client I just sub-classed 'publish' to assert payload is not an unicode string, (and convert bytes to bytearray for the bug). I tried taking care of encoding since the beginning but this made me see many places where auto-conversion allowed bad string handling in my code.

Also, in practice, paho is able to automatically encode to utf-8 but cannot, of course, decode automatically so the magic is not symmetric.

Ideas on how to implement it:

  • Add a an attribute client to choose this mode or not.
  • Add a an option to 'publish/last_will' that sets encoding to 'utf-8' by default (could be backward compatible) and add a decoded_payload(encoding='utf-8') method to iMQTTMessage.
  • Remove auto-conversion and break everyone's application.

I would even make the auto-conversion raise a warning when it is not respected. Crashing would be problematic as it can happen dynamically on a really well hidden case. But this is a maintainer choice with other problems in mind.

cladmi avatar Jan 27 '17 14:01 cladmi

I'm inclined to agree that paho should not take it upon itself to convert payloads, if possible. When it is absolutely necessary, it should be configurable as far as possible and should raise a warning when it is not configured.

Note that it is part of the MQTT spec for all of the following to be UTF-8 strings: Protocol Name, ClientId, Will topic, User name, Topic name, Topic filter. But I don't see any requirement for any other field to be UTF-8. Furthermore, if any of these fields contain ill-formed UTF-8, then the server or client MUST close the network connection.

Also: https://github.com/mqtt/mqtt.github.io/wiki/clarify_utf8_strings

jamesmyatt avatar Jan 30 '17 14:01 jamesmyatt