cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Unique `pathlib.Path` instances cause memory usage to increase indefinitely in 3.12 only

Open rafsaf opened this issue 1 year ago • 0 comments

Bug report

Bug description:

Minimal example:

from pathlib import Path
import resource
import random
import string


def randomword() -> str:
    letters = string.ascii_lowercase
    return "".join(random.choice(letters) for _ in range(100))


if __name__ == "__main__":
    while True:
        # debug print
        print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)

        str(Path(randomword()))

Hi! The above code cause constant memory usage increase on Linux. I left print of current memory usage, but it's not needed. Tools like memray and my investigation shows it can be related to sys.intern use in pathlib _parse_path func. From my understanding memory is not cleared after references are gone (or looking at what this function does new memory should not be allocated in interned table at some point like seems to be a case in other cpython versions, just a guess tho, i didn't dive that deep). Even if this is completely different issue or expected behaviour, I would expect memory to not increase forever and the code works as I would expect in 3.11.

I've been only able to reproduce it on Debian and Ubuntu (amd64) for python 3.12, not for 3.11,10,9,8 and not 3.13.0b3.

Initially I was debugging memory problems in a code that creates quite many unique paths and it was weird that memory usage was very high after program was running for weeks in kubernetes (plus then some spare hours to eventually find the culprit), so definitely it can have an impact. In a code that doesn't create a lot of stuff, difference can be subtle and negligible.

CPython versions tested on:

3.8, 3.9, 3.10, 3.11, 3.12, 3.13

Operating systems tested on:

Linux

rafsaf avatar Jul 15 '24 00:07 rafsaf