Improve the `pip.parse` API to allow for incremental building of the configuration
Currently the API for defining different parameters by target platform is not ideal and we need to jump over hoops whilst trying to maintain it. The main difficulties are:
- We would like to specify different
pip.parseattributes per target platform, but we cannot, because some of the parameters need to be labels, label lists or other. - We cannot have overrides for these things easily and the definitions of what those target platforms are are hard-coded in the code.
The idea that I have is to reuse the same recipe from #2578, where we create a builder for the configuration and it allows us easily define:
- default constraint and flag values for the config settings that we create for different platform variants.
- default values for the pip index url, etc
- default values for which platforms we should support.
- default
authconfiguration for thebazel_downloader. - The list of
python_versionsthat are supported #1708. - build the configuration incrementally for the
pip.parseinvocation allowing per-target-platform configuration of:-
extra_pip_args. #2745 -
requirements_lock. -
constraint_valuesandflag_values. #2548
-
What is more the override API could be blended in more easily to provide better support for specifying different, patches, etc.
TODO:
- [ ] Design
defaultsandconfigureAPIs and use that to set the defaults withinrules_pythonfor selecting the whls. - [ ] Do the same for
index_urlsetting, at some pointrules_pythoncould set the defaultindex_urlto default to the new code path. - [ ] Transition the current
pip.parseto usepip.configureunder the hood.
Taking the example from https://github.com/bazel-contrib/rules_python/pull/2578#issuecomment-2628784221
We should have something like:
pip.defaults(
hub_name = `pypi`,
index_url = "https://pypi.org/simple/", # once we want to enable the `experimental_index_url` for everyone or all pip.parse repos regardless where they come from.
index_url_overrides = {},
)
pip.defaults(
platform = "cp39_linux_x86_64",
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
flag_values = {
"//python/config_settings:python_version_major_minor": "3.9",
}
)
pip.defaults(
platform = "cp313t_linux_x86_64",
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
flag_values = {
"//python/config_settings:python_version_major_minor": "3.13",
"//python/config_settings:py_freethreaded": "yes",
}
)
pip.defaults(
platform = "cp313_linux_x86_64",
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
flag_values = {
"//python/config_settings:python_version_major_minor": "3.13",
"//python/config_settings:py_freethreaded": "no",
}
)
# At user site
pip.configure(
name = "pypi",
pip_args = ["default"],
)
pip.configure(
name = "pypi",
requirements = "//:requirements_darwin.txt",
platform = "cp313t_osx_x86_64",
pip_args = ["only for this platform"],
)
pip.configure(
name = "pypi",
requirements = "//:requirements_linux_gpu.txt",
platform = "cp313gpu_linux_x86_64",
pip_args = ["only for this platform"],
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
flag_values = {
"//my_custom_flag": "cuda",
"@rules_python//python/config_settings:python_version_major_minor": "3.13",
"@rules_python//python/config_settings:py_freethreaded": "no",
},
)
pip.configure(
name = "pypi",
requirements = "//:requirements_linux_3_10_11_workaround.txt",
platform = "cp310.11_linux_x86_64",
pip_args = ["only for this platform"],
constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
flag_values = {
"@rules_python//python/config_settings:python_version_major_minor": "3.10.11",
},
)
I am not sure if all of these permutations are really necessary, but the whole override idea is really nice.
Whilst the uv the uv_version is the main key where the configuration is built/modified. Here we could have the hub_name.
OK, that is it for now.
One override case we wound up needing, which lead to https://github.com/bazel-contrib/rules_python/pull/2813, is that we have 2 separate pip hubs both for linux x86_64 that differ on other hardware constraints, specifically if there is a GPU or not. In this case we want to patch a wheel only in 1 of the cases because the dependency has different behavior on CPU vs GPU, so the patch is only valid in 1 of the hubs, but the wheel name is the same on both. This lead me to writing the linked PR and using it like this:
pip.override(
file = "xgrammar-0.1.18-cp{version}-cp{version}-manylinux_2_17_x86_64.manylinux2014_x86_64.whl".format(version = DEFAULT_PYTHON_VERSION.replace(".", "")),
hub_name = "gpu_deps",
patch_strip = 1,
patches = [
...
],
)
In previous cases I was able to sidestep this because the wheel name can potentially have the GPU specifics, as is the case for torch, so we defacto got the same behavior without the need for specifying hub_name.
This case could also be solved if use_extension(isolated=True) worked, but I saw in some other rules_python issues that it might not. I briefly tried it and it did work, but I'm not sure what later issues I would hit
The isolated = True should work I think, if it does not feel free to raise tickets separately.
Another similar case I saw yesterday: jax tests with both numpy 1 and numpy 2. Having separate requirements.txt files would be fine (requirements.txt can't model this, so I don't see how to do it otherwise), but having separate pip hubs would be convoluted/confusing. Having a project-specific flag (like the cuda one mentioned in the second post) would be much easier.
Some notes before I sign off:
- I have created 3 PRs that thread through custom target setting passing. Thanks for the nudge @rickeylev. The 3rd PR is not that much and it seems to be pretty straightforward.
- We still need to do some cleanup/simplification of the code so that stuff becomes more maintainable:
-
select_whlsshould return 1 wheel that is most specialized for a given platform. - The user should be able to customize the prioritize of which wheels are preferred by specifying a list like
["musllinux_*_x86_64", "manylinux_*_x86_64", "linux_*_x86_64", "any"]. - The whlname-specific config settings are not used anymore and we always use the
target_platformsinstead, which should match to the single most-specialized wheel returned byselect_whls. - Remove usage of
whl_target_platforms- having the hardcoded list of CPUs is not configurable and it is hard to make it so.
-
See a comment in https://github.com/bazel-contrib/rules_python/pull/3255#issuecomment-3292622784 for what me want to support in whl_platform_tags.