cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Pickle deserialization null byte discrepancy

Open Legoclones opened this issue 1 year ago • 7 comments

Bug report

Bug description:

In C, the null byte indicates the end of a char[]. In INT and LONG opcodes for pickle, everything up to a newline is read from the bytestream and ran through a string to integer conversion function. However, a bytestream like b'L1\x00anything\n.' or b'I1\x00anything\n.' does not fail in _pickle.c (like it does in pickle.py and pickletools.py) due to the null byte.

On line 5208 (for INT) and line 5362 (for LONG), _Unpickler_Readline(state, self, &s) reads everything (including a null byte) into the s variable, which is char *. However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered, meaning everything including and after the null byte is ignored, returning 1 (in the above example).

It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.

Edit - this also applies to FLOAT.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Legoclones avatar Nov 19 '24 02:11 Legoclones

However, strtol or PyLong_FromString (1, 2) stop when the first null byte is encountered

How do you produce pickle files which contain null bytes? Using pickle.dumps()?

vstinner avatar Nov 19 '24 11:11 vstinner

pickle.dumps() or other built in functions wouldn't produce this because they use repr() on a number, so this would have to be a custom pickle made by hand.

Legoclones avatar Nov 19 '24 18:11 Legoclones

It's a small inconsistency as an edge case, but I'm not sure how to fix it, or whether having it stopped at a null byte is desired behavior or not.

I don't think that it's the desired behavior and I don't see any easy fix. Unless someone has a fix, I suggest to close the issue, since it's more a theoretical issue.

vstinner avatar Nov 20 '24 08:11 vstinner

It is known issue that PyLong_FromString() truncates input at embedded null byte. We can use private function _PyLong_FromBytes() or explicitly check strlen(). This is relatively easy issue.

serhiy-storchaka avatar Nov 20 '24 15:11 serhiy-storchaka

Does someone want to propose a PR using strlen()?

vstinner avatar Nov 21 '24 09:11 vstinner

Just to clarify, would we be using strlen() (which stops at a null byte) to ensure that the null-terminated length is the same as the len from the program? To check if there's a null byte present or not? If not, how would strlen() help?

Legoclones avatar Nov 22 '24 23:11 Legoclones

Note - I also discovered that the GET and PUT opcodes suffer from the same null byte discrepancy.

Legoclones avatar Jan 03 '26 05:01 Legoclones