Pickle deserialization null byte discrepancy
Bug report
Bug description:
In C, the null byte indicates the end of a char[]. In INT and LONG opcodes for pickle, everything up to a newline is read from the bytestream and ran through a string to integer conversion function. However, a bytestream like b'L1\x00anything\n.' or b'I1\x00anything\n.' does not fail in _pickle.c (like it does in pickle.py and pickletools.py) due to the null byte.
On line 5208 (for INT) and line 5362 (for LONG), _Unpickler_Readline(state, self, &s) reads everything (including a null byte) into the s variable, which is char *. However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered, meaning everything including and after the null byte is ignored, returning 1 (in the above example).
It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.
Edit - this also applies to FLOAT.
CPython versions tested on:
3.11
Operating systems tested on:
Linux
However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered
How do you produce pickle files which contain null bytes? Using pickle.dumps()?
pickle.dumps() or other built in functions wouldn't produce this because they use repr() on a number, so this would have to be a custom pickle made by hand.
It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.
I don't think that it's the desired behavior and I don't see any easy fix. Unless someone has a fix, I suggest to close the issue, since it's more a theoretical issue.
It is known issue that PyLong_FromString() truncates input at embedded null byte. We can use private function _PyLong_FromBytes() or explicitly check strlen(). This is relatively easy issue.
Does someone want to propose a PR using strlen()?
Just to clarify, would we be using strlen() (which stops at a null byte) to ensure that the null-terminated length is the same as the len from the program? To check if there's a null byte present or not? If not, how would strlen() help?
Note - I also discovered that the GET and PUT opcodes suffer from the same null byte discrepancy.