[BUG] Pararell builds result in different module content
setuptools version
67.7.2
Python version
3.11.3
OS
Gentoo
Additional environment information
This is environment independent and will reproduce on any OS, and even Python/setuptools version.
Description
If setup.py defines two or more Extension, that share some a common source file (e.g. helpers.c containing helper functions used from both extensions) , then this file is compiled multiple times (which is not expected, but tolerable). But, when running parallel build, these compilers spawn to build said common file(s) step on each other toes and overwrite each other's result. For the best experience let compiler/linker process .o file further, e.g. by turning on LTO.
I've originally met a variation of this bug here: https://bugs.gentoo.org/907718
And an example of a project that declares multiple Extension-s and yet uses this common helpers file is: https://gitlab.com/libvirt/libvirt-python/
In the Gentoo bug, there's a different variation of this very bug to be seen: after gcc produced libvirt-utils.o / typewrappers.o the linker started doing LTO (i.e. lto1 process was spawned), but then another gcc came and started rewriting one of those .o files disrupting already running linker. This is the error that can be found in attachment of the Gentoo bug:
lto1: error: build/temp.linux-x86_64-cpython-310/libvirt-utils.o: file too short
lto1: fatal error: errors during merging of translation units
And before this is disregarded beacuse LTO is unstable/experimental - I have a reproducer below without LTO.
Expected behavior
Build result should not depend on number of parallel jobs.
How to Reproduce
- Clone https://github.com/zippy2/setuptools_reproducer
- Follow steps from
Readme.txt, but basically, it's compiling the minimal reproducer two times: once with-j1and then with-j2to observe difference in resulted modules.
Output
Non-parallel build:
1) ./setup.py build -j1
2) objdump -d build/lib*/mod1*.so | grep -A5 myfunction
0000000000001135 <myfunction>:
1135: 55 push %rbp
1136: 48 89 e5 mov %rsp,%rbp
1139: b8 0c 00 00 00 mov $0xc,%eax
113e: 5d pop %rbp
113f: c3 ret
3) objdump -d build/lib*/mod2*.so | grep -A5 myfunction
0000000000001135 <myfunction>:
1135: 55 push %rbp
1136: 48 89 e5 mov %rsp,%rbp
1139: b8 2a 00 00 00 mov $0x2a,%eax
113e: 5d pop %rbp
113f: c3 ret
As expected, myfunction() returns value 12 for mod1 and value 42 for mod2.
Parallel build:
1) ./setup.py build -j2
2) objdump -d build/lib*/mod1*.so | grep -A5 myfunction
0000000000001135 <myfunction>:
1135: 55 push %rbp
1136: 48 89 e5 mov %rsp,%rbp
1139: b8 2a 00 00 00 mov $0x2a,%eax
113e: 5d pop %rbp
113f: c3 ret
3) objdump -d build/lib*/mod2*.so | grep -A5 myfunction
0000000000001135 <myfunction>:
1135: 55 push %rbp
1136: 48 89 e5 mov %rsp,%rbp
1139: b8 2a 00 00 00 mov $0x2a,%eax
113e: 5d pop %rbp
113f: c3 ret
This was also https://github.com/python/cpython/issues/87625 which has a reproducible test case. I recently hit it with https://github.com/sagemath/pplpy/ where several extensions share the same source file.
It's likely hard to solve since the solution is to get a fully defined project-wide build graph like a Makefile / ninja file, which would require serious rearchitecting of setuptools to accomplish.
pplpy could port to meson-python as I think sagemath is already interested in this (ppl_shim.cc can then become an object library linked into each extension), or somehow enforce -j1.
Unfortunately, meson-python is too new for anything that aims to support a non-bleeding edge distro (like CentOS Stream 9, Ubuntu 22.04, OpenSuse Leap 15, etc.) which is sensible requirement for any mature project. I mean, in libvirt we have a policy which grants users a transition period of two years to adapt to new major release. OTOH, I must say that rewriting libvirt-python (though, only locally for now) feels much cleaner than setuptools.
So setuptools is not willing to fix it, yet we can't use meson-python. What a pickle.