`py_binary` targets added to `py_wheel.deps` end up including the binary in the wheel. Sounds like WAI but it was surprising to me
π bug report
Affected Rule
-
py_wheel
Is this a regression?
Nope, not that I can tell at least.
Description
Adding a py_binary target as a dep to py_wheel results in both the binary's srcs being added to the wheel and the binary itself. It's this latter part that is unexpected (at least to me, haha).
Maybe I'm missing an option that causes the bazel binaries to be excluded?
π¬ Minimal Reproduction
py_binary(
name = "foo",
srcs = ["foo.py"],
)
py_wheel(
...,
distribution = "our_tables",
deps = [":foo"],
)
$ less bazel-bin/path/to/foo/foo-0.0.1-py3-none-any.whl | cat
Metadata-Version: 2.1
Name: our_tables
Version: 0.0.1
Archive: bazel-bin/path/to/foo/our_tables-0.0.1-py3-none-any.whl
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
24451 Defl:N 7964 67% 1980-01-01 00:00 2d00cf53 our_tables/foo
5124 Defl:N 1787 65% 1980-01-01 00:00 4846e197 our_tables/foo.py
91 Defl:N 86 6% 1980-01-01 00:00 4f1122ec our_tables-0.0.3.dist-info/WHEEL
9031 Defl:N 3390 63% 1980-01-01 00:00 e00f51c6 our_tables-0.0.3.dist-info/METADATA
698 Defl:N 445 36% 1980-01-01 00:00 8c16618f our_tables-0.0.3.dist-info/RECORD
-------- ------- --- -------
47408 16577 65% 5 files
It's that our_tables/foo that's strange to me. IMO it should not be included by default.
π₯ Exception or Error
N/A
π Your Environment
Operating System:
gLinux
Output of bazel version:
$ bazel version
Bazelisk version: v1.20.0
Build label: 7.6.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Mar 31 17:08:56 2025 (1743440936)
Build timestamp: 1743440936
Build timestamp as int: 1743440936
Rules_python version:
1.4.1
Anything else relevant?
In some sense I think this is working as intended, because the executable is in the default outputs group.
Maybe what we should change is to not allow py_binary and py_test rules to be added to the py_wheel, because actually what you want is only py_library files in the wheel. However, that is a breaking change.
Maybe the py_wheel should instead only depend on the PyInfo provider (or something similar) to make it just work. In some sense I imagine users using py_binary as a py_library and not being able to package your code without changing your project structure sounds like a huge rough corner.
That said, I would not call the current behaviour a bug.
Yeah, I kinda figured it was WAI.
Maybe what we should change is to not allow
py_binaryandpy_testrules to be added to thepy_wheel, because actually what you want is onlypy_libraryfiles in the wheel. However, that is a breaking change.
Agreed, this is (a) breaking and (b) not the desired behavior. There are plenty of cases (legitimate and illegitimate) where a binary target is used as a library. For the illegitimate case, typically what happens is that a library has if __name__ slapped inside it for some debugging or local development work and now it's considered a "binary".
Maybe the
py_wheelshould instead only depend on thePyInfoprovider (or something similar) to make it just work.
When you say "just" work - do you mean "barely" work? Or do you mean "it works and is not complicated"?
I imagine users using
py_binaryas apy_libraryand not being able to package your code without changing your project structure sounds like a huge rough corner.
Yup. I try to tell people to make a stub-style binary but it's an uphill battle to say the least.
My thought was maybe we could add a flag like py_wheel.include_binaries: bool = True. When False, the binaries (note: not the .py files) are stripped from the sdist/wheel contents before bundling. Does that sound like a reasonable (and feasible) solution?
When you say "just" work - do you mean "barely" work? Or do you mean "it works and is not complicated"?
Either it has to "work and not be complicated" or it has to not be here at all. π
Maybe we could use the PyInfo provider, but I can't remember if it includes the .so files and similar. We have a need in general to be able to tell the venv machinery to put certain files into certain places, so maybe this will be easier once we solve the venv building in #2156.
I'm surprised that only the foo executable shows up. I would expect the entire Python runtime to also show up because the runfiles should be merged in. Is the posted example using workspace instead of bzlmod?
An executable can be detected by looking for PyExecutableInfo, however, that won't be sufficient. The binary's runfiles may contain the Python runtime, which can't easily be sorted out from the regular runfiles that should be included.
To do this properly, there has to be a stronger distinction between the "binary" and "library" parts of a py_binary target. Something like:
- Make py_binary produce a PyInfo the same as if it was a py_library
- Move the binary-specific parts into PyExecutableInfo
- Both PyInfo and PyExecutableInfo need separate attributes for their respective runfiles
- Consumers, like py_wheel, can then choose whether to use it as a library, binary, or generic target.
Posted example is bzlmod.
I dug a little deeper. The binary our_tables/foo is the boilerplate that starts with:
#!/usr/bin/env python3
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
# The Python interpreter unconditionally prepends the directory containing this
...
I would expect the entire Python runtime to also show up because the runfiles should be merged in.
It looks like the only files that are merged in are:
https://github.com/bazel-contrib/rules_python/blob/3aea414aa2e5f1e0e915f58a434c01428a90382c/python/private/py_wheel.bzl#L329
The direct_pyi_files are:
DEBUG: .../py_wheel.bzl:328:10: ctx.files.deps=[<source file src/doug/foo.py>, <generated file src/doug/foo>]
DEBUG: .../py_wheel.bzl:329:10: Direct pyi files=[]
DEBUG: .../py_wheel.bzl:333:10: inputs_to_package=depset([<source file src/doug/foo.py>, <generated file src/doug/foo>])
I don't yet know enough about internals, but maybe a very naive "drop if is generated file and does not have .py extension" check might be sufficient?
We'll want to be careful about generated files that are not generated from a py_binary, though. There are rare but legitimate causes for generating a python library file (but I'm not aware of any cases where the generated file would not have the .py extension and you'd still want it as part of the wheel).
So I have no idea what the impact of this is, but this patch results in no foo binary being added to the wheel (patch base: 1.4.1). Like, I'd imagine that this would probably break data dependencies being added, but it's a start!
diff --git a/python/private/py_wheel.bzl b/python/private/py_wheel.bzl
index c196ca6a..991d6555 100644
--- a/python/private/py_wheel.bzl
+++ b/python/private/py_wheel.bzl
@@ -337,6 +337,8 @@ def _py_wheel_impl(ctx):
packageinputfile = ctx.actions.declare_file(ctx.attr.name + "_target_wrapped_inputs.txt")
content = ""
for input_file in inputs_to_package.to_list():
+ if not input_file.is_source:
+ continue
content += _input_file_to_arg(input_file) + "\n"
ctx.actions.write(output = packageinputfile, content = content)
other_inputs.append(packageinputfile)
$ less bazel-bin/src/doug/our_tables-0.0.1-py3-none-any.whl | cat
Metadata-Version: 2.1
Name: our_tables
Version: 0.0.1
Archive: bazel-bin/src/doug/our_tables-0.0.1-py3-none-any.whl
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
62 Defl:N 62 0% 1980-01-01 00:00 3431507e doug/foo.py
91 Defl:N 86 6% 1980-01-01 00:00 4f1122ec our_tables-0.0.1.dist-info/WHEEL
63 Defl:N 56 11% 1980-01-01 00:00 6d200e48 our_tables-0.0.1.dist-info/METADATA
274 Defl:N 200 27% 1980-01-01 00:00 d92b0407 our_tables-0.0.1.dist-info/RECORD
-------- ------- --- -------
490 404 18% 4 files
I'd imagine that this would probably break data dependencies being added
Actually it looks like it's OK for the basic data cases such as filegroup targets or direct file targets:
$ bazel build //src/doug:wheel
$ less bazel-bin/src/doug/our_tables-0.0.1-py3-none-any.whl | cat
Metadata-Version: 2.1
Name: our_tables
Version: 0.0.1
Archive: bazel-bin/src/doug/our_tables-0.0.1-py3-none-any.whl
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
10 Defl:N 12 -20% 1980-01-01 00:00 c801f919 doug/data.txt
62 Defl:N 62 0% 1980-01-01 00:00 3431507e doug/foo.py
15 Defl:N 17 -13% 1980-01-01 00:00 d27ea5b1 doug/more_data.txt
91 Defl:N 86 6% 1980-01-01 00:00 4f1122ec our_tables-0.0.1.dist-info/WHEEL
63 Defl:N 56 11% 1980-01-01 00:00 6d200e48 our_tables-0.0.1.dist-info/METADATA
420 Defl:N 293 30% 1980-01-01 00:00 942311a4 our_tables-0.0.1.dist-info/RECORD
-------- ------- --- -------
661 526 20% 6 files
But data targets that are the result of a genrule or other generated file will be excluded. This is expected given the naivetΓ© of the patch.
Obviously the 4-point description that @rickeylev mentioned in https://github.com/bazel-contrib/rules_python/issues/2923#issuecomment-2906039765 is the more robust way to go about things. I just don't have the knowledge to implement such π.
Oh, huh. Looking at the py_wheel code, it doesn't take the run files? Just the immediate default outputs? That's surprising. It is beneficial for this situation though -- we don't have to worry about transitive run files merging so just have to split the immediate binary specific outputs out of PyInfo and into PyExecutableInfo
Oh, huh. Looking at the py_wheel code, it doesn't take the run files? Just the immediate default outputs? That's surprising. It is beneficial for this situation though -- we don't have to worry about transitive run files merging so just have to split the immediate binary specific outputs out of PyInfo and into PyExecutableInfo
The problem with this is it then is missing any data dependencies which may be required e.g. a python C extension. In general you can't reproduce the correct configuration for the data attribute at the py_wheel stage and so really py_wheel and conversely py_package should be packaging runfiles / data as is as its the only way to get the correct configuration for these targets in the general case.