Is there a way to disable decoding bytes into a string?
There are fields that are bytes encoded as string, like token, peer id, node id, etc. But actually they are better treated as bytes instead of str. And the decode of these values are just a guess. See bcoding.py#L76. It's better to handle a certain type rather than do a check of type before dealing with the data.
So, is there a way to disable the decoding?
def _decode_buffer(f):
"""
String types are normal (byte)strings
starting with an integer followed by ':'
which designates the string’s length.
Since there’s no way to specify the byte type
in bencoded files, we have to guess
"""
strlen = int(_readuntil(f, _TYPE_SEP))
buf = f.read(strlen)
if not len(buf) == strlen:
raise ValueError(
'string expected to be {} bytes long but the file ended after {} bytes'
.format(strlen, len(buf)))
try:
return buf.decode()
except UnicodeDecodeError:
return buf
sure, we could add a parameter to bdecode that gets passed down all the way through the _decode_buffer function, like
def bdecode(f_or_data, try_decode=True):
...
... _decode_buffer(f, try_decode)
...
care to do a PR?
Okay. Just need sometime before I can work on this.
BTW: How do you like create a custom exception class instead of ValueError and TypeError? Sometime need a way to tell if it's some peers sending garbage, but occationally catching ValueError and TypeError mixed with other exceptions.
sounds good!
subclassing TypeError and ValueError would allow both fine-grained exception handling and still catching the more generic ones.
Great. I will make a change about the 2 things we discussed. Can't promise the time, but will do ASAP.