pyperf Windows bench_command memory tracking fails

Creating a new issue to track a PR I'm working on to fix the issue in the title. Related issue from May 2021: #97

I was attempting to track the memory usage of command benchmarks on Windows, but got the following errors when doing so:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 769, in <module>
    main()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
    func()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
    runner.bench_command(name, command)
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
    return self._main(task)
           ^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 460, in _main
    bench = self._worker(task)
            ^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 434, in _worker
    run = task.create_run()
          ^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_worker.py", line 299, in create_run
    self.compute()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_command.py", line 70, in compute
    raise RuntimeError("failed to get the process RSS")
RuntimeError: failed to get the process RSS
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "\.venv\Scripts\pyperf.exe\__main__.py", line 10, in <module>
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 765, in main
    func()
  File "D:\my_project\.venv\Lib\site-packages\pyperf\__main__.py", line 734, in cmd_bench_command
    runner.bench_command(name, command)
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 747, in bench_command
    return self._main(task)
           ^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 465, in _main
    bench = self._manager()
            ^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_runner.py", line 678, in _manager
    bench = Manager(self).create_bench()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 243, in create_bench
    worker_bench, run = self.create_worker_bench()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 142, in create_worker_bench
    suite = self.create_suite()
            ^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 132, in create_suite
    suite = self.spawn_worker(self.calibrate_loops, 0)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\my_project\.venv\Lib\site-packages\pyperf\_manager.py", line 118, in spawn_worker
    raise RuntimeError("%s failed with exit code %s"
RuntimeError: D:\my_project\.venv\Scripts\python.exe failed with exit code 1

I debugged my way through the code and ended up getting the root cause, which is located here: https://github.com/psf/pyperf/blob/e0610c2f263c300de870e7bc8e494d46e0fd71be/pyperf/_process_time.py#L25-L42

In short, this function gets the current process resident set size by using the resource library, but this library is only available on Linux. When run on Windows, this function simply returns 0, which causes the downstream callers to see this as an error and fail running the benchmark entirely.

I began working on a fork where I instead use psutil to get the current process' RSS, but I noticed that psutil.Process().memory_info().rss returns higher values than the measurements from the resource library. I'm seeing roughly 25% - 35% higher RSS size with psutil`, so that leads to a dilemma in terms of accuracy across operating systems. We have a few options:

psutil works cross-platform, but the rss values are not accurate with what the resource module gets. We can opt to only use psutil moving forward, but that would invalidate all existing command benchmark results until they are re-run.
We can use psutil only for Windows systems, but this leads to a memory usage discrepancy between operating systems. On my Mac Mini, the resource and psutil RSS sizes did not match by a wide margin, so for Windows systems it would falsely appear to have higher memory usage than Mac systems (and presumably Linux ones as well).
We can use some other data point, such as the Unique Set Size from psutil through psutil.Process().memory_full_info().uss. USS is closer to what the resource module gets for RSS, but now USS is about 15% smaller than RSS from the resource module. USS is supposed to be the closest representation of the process memory usage, which should be more ideal than RSS or peak RSS

I'm not aware of any other ways to get the memory usage of a process without writing some C bindings to do so. What's more confusing is that there is also the _win_memory.py file that uses Windows-native functionality to track memory usage, but from my testing that's not used correctly - if it was then I wouldn't be getting the above error. I see in both _runner.py and _worker.py that we break down what method to use based on what OS is running. If we go with using psutil for the unifying the memory tracking of command benchmarks, should we do the same for regular benchmarks?

Apr 23 '25 00:04 HunterAP23

Hate to bother but I figure this could be a big change - any opinions @mdboom ?

Apr 23 '25 21:04 HunterAP23

Using psutil.Process().memory_info().rss on Windows sounds like a good idea.

What's more confusing is that there is also the _win_memory.py file that uses Windows-native functionality to track memory usage, but from my testing that's not used correctly - if it was then I wouldn't be getting the above error.

It's used by collect_metadata to set mem_peak_pagefile_usage metadata. I don't know how it compares to RSS memory.

What does psutil use to compute the RSS memory?

Apr 24 '25 09:04 vstinner

They get it by using C extensions to query PROCESS_QUERY_LIMITED_INFORMATION and getting out the PROCESS_MEMORY_COUNTERS struct: https://github.com/giampaolo/psutil/blob/master/psutil/arch/windows/proc.c#L359-L386

For Linux they instead look at the stats of the PID's statm file on disk: https://github.com/giampaolo/psutil/blob/master/psutil/_pslinux.py#L1913-L1930

And lastly on MacOS they have this C extension but I'm having trouble following where the data is pulled from into their proc_taskinfo struct: https://github.com/giampaolo/psutil/blob/master/psutil/arch/osx/proc.c#L431-L475

Apr 24 '25 12:04 HunterAP23