setuptools icon indicating copy to clipboard operation
setuptools copied to clipboard

[BUG] Pararell builds result in different module content

Open zippy2 opened this issue 2 years ago • 3 comments

setuptools version

67.7.2

Python version

3.11.3

OS

Gentoo

Additional environment information

This is environment independent and will reproduce on any OS, and even Python/setuptools version.

Description

If setup.py defines two or more Extension, that share some a common source file (e.g. helpers.c containing helper functions used from both extensions) , then this file is compiled multiple times (which is not expected, but tolerable). But, when running parallel build, these compilers spawn to build said common file(s) step on each other toes and overwrite each other's result. For the best experience let compiler/linker process .o file further, e.g. by turning on LTO.

I've originally met a variation of this bug here: https://bugs.gentoo.org/907718 And an example of a project that declares multiple Extension-s and yet uses this common helpers file is: https://gitlab.com/libvirt/libvirt-python/

In the Gentoo bug, there's a different variation of this very bug to be seen: after gcc produced libvirt-utils.o / typewrappers.o the linker started doing LTO (i.e. lto1 process was spawned), but then another gcc came and started rewriting one of those .o files disrupting already running linker. This is the error that can be found in attachment of the Gentoo bug:

lto1: error: build/temp.linux-x86_64-cpython-310/libvirt-utils.o: file too short
lto1: fatal error: errors during merging of translation units

And before this is disregarded beacuse LTO is unstable/experimental - I have a reproducer below without LTO.

Expected behavior

Build result should not depend on number of parallel jobs.

How to Reproduce

  1. Clone https://github.com/zippy2/setuptools_reproducer
  2. Follow steps from Readme.txt, but basically, it's compiling the minimal reproducer two times: once with -j1 and then with -j2 to observe difference in resulted modules.

Output

Non-parallel build:
1) ./setup.py build -j1
2) objdump -d build/lib*/mod1*.so | grep -A5 myfunction
   0000000000001135 <myfunction>:
    1135:       55                      push   %rbp
    1136:       48 89 e5                mov    %rsp,%rbp
    1139:       b8 0c 00 00 00          mov    $0xc,%eax
    113e:       5d                      pop    %rbp
    113f:       c3                      ret

3) objdump -d build/lib*/mod2*.so | grep -A5 myfunction
   0000000000001135 <myfunction>:
    1135:       55                      push   %rbp
    1136:       48 89 e5                mov    %rsp,%rbp
    1139:       b8 2a 00 00 00          mov    $0x2a,%eax
    113e:       5d                      pop    %rbp
    113f:       c3                      ret

As expected, myfunction() returns value 12 for mod1 and value 42 for mod2.

Parallel build:
1) ./setup.py build -j2
2) objdump -d build/lib*/mod1*.so | grep -A5 myfunction
   0000000000001135 <myfunction>:
    1135:       55                      push   %rbp
    1136:       48 89 e5                mov    %rsp,%rbp
    1139:       b8 2a 00 00 00          mov    $0x2a,%eax
    113e:       5d                      pop    %rbp
    113f:       c3                      ret

3) objdump -d build/lib*/mod2*.so | grep -A5 myfunction
   0000000000001135 <myfunction>:
    1135:       55                      push   %rbp
    1136:       48 89 e5                mov    %rsp,%rbp
    1139:       b8 2a 00 00 00          mov    $0x2a,%eax
    113e:       5d                      pop    %rbp
    113f:       c3                      ret

zippy2 avatar Jun 05 '23 03:06 zippy2

This was also https://github.com/python/cpython/issues/87625 which has a reproducible test case. I recently hit it with https://github.com/sagemath/pplpy/ where several extensions share the same source file.

orlitzky avatar Apr 01 '24 00:04 orlitzky

It's likely hard to solve since the solution is to get a fully defined project-wide build graph like a Makefile / ninja file, which would require serious rearchitecting of setuptools to accomplish.

pplpy could port to meson-python as I think sagemath is already interested in this (ppl_shim.cc can then become an object library linked into each extension), or somehow enforce -j1.

eli-schwartz avatar Apr 01 '24 01:04 eli-schwartz

Unfortunately, meson-python is too new for anything that aims to support a non-bleeding edge distro (like CentOS Stream 9, Ubuntu 22.04, OpenSuse Leap 15, etc.) which is sensible requirement for any mature project. I mean, in libvirt we have a policy which grants users a transition period of two years to adapt to new major release. OTOH, I must say that rewriting libvirt-python (though, only locally for now) feels much cleaner than setuptools.

So setuptools is not willing to fix it, yet we can't use meson-python. What a pickle.

zippy2 avatar Apr 27 '24 19:04 zippy2