rules_python Improve the `pip.parse` API to allow for incremental building of the configuration

Currently the API for defining different parameters by target platform is not ideal and we need to jump over hoops whilst trying to maintain it. The main difficulties are:

We would like to specify different pip.parse attributes per target platform, but we cannot, because some of the parameters need to be labels, label lists or other.
We cannot have overrides for these things easily and the definitions of what those target platforms are are hard-coded in the code.

The idea that I have is to reuse the same recipe from #2578, where we create a builder for the configuration and it allows us easily define:

default constraint and flag values for the config settings that we create for different platform variants.
default values for the pip index url, etc
default values for which platforms we should support.
default auth configuration for the bazel_downloader.
The list of python_versions that are supported #1708.
build the configuration incrementally for the pip.parse invocation allowing per-target-platform configuration of:
- extra_pip_args. #2745
- requirements_lock.
- constraint_values and flag_values. #2548

What is more the override API could be blended in more easily to provide better support for specifying different, patches, etc.

TODO:

[ ] Design defaults and configure APIs and use that to set the defaults within rules_python for selecting the whls.
[ ] Do the same for index_url setting, at some point rules_python could set the default index_url to default to the new code path.
[ ] Transition the current pip.parse to use pip.configure under the hood.

Apr 07 '25 01:04 aignas

Taking the example from https://github.com/bazel-contrib/rules_python/pull/2578#issuecomment-2628784221

We should have something like:

pip.defaults(
    hub_name = `pypi`,
    index_url = "https://pypi.org/simple/", # once we want to enable the `experimental_index_url` for everyone or all pip.parse repos regardless where they come from.
    index_url_overrides = {},
)
pip.defaults(
    platform = "cp39_linux_x86_64",
    constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
    flag_values = {
        "//python/config_settings:python_version_major_minor": "3.9",
    }
)
pip.defaults(
    platform = "cp313t_linux_x86_64",
    constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
    flag_values = {
        "//python/config_settings:python_version_major_minor": "3.13",
        "//python/config_settings:py_freethreaded": "yes",
    }
)
pip.defaults(
    platform = "cp313_linux_x86_64",
    constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
    flag_values = {
        "//python/config_settings:python_version_major_minor": "3.13",
        "//python/config_settings:py_freethreaded": "no",
    }
)
# At user site
pip.configure(
    name = "pypi",
    pip_args = ["default"],
)
pip.configure(
    name = "pypi",
    requirements = "//:requirements_darwin.txt",
    platform = "cp313t_osx_x86_64",
    pip_args = ["only for this platform"],
)
pip.configure(
    name = "pypi",
    requirements = "//:requirements_linux_gpu.txt",
    platform = "cp313gpu_linux_x86_64",
    pip_args = ["only for this platform"],
    constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
    flag_values = {
        "//my_custom_flag": "cuda",
        "@rules_python//python/config_settings:python_version_major_minor": "3.13",
        "@rules_python//python/config_settings:py_freethreaded": "no",
    },
)
pip.configure(
    name = "pypi",
    requirements = "//:requirements_linux_3_10_11_workaround.txt",
    platform = "cp310.11_linux_x86_64",
    pip_args = ["only for this platform"],
    constraint_values = ["@platforms//os:linux", "@platforms//cpu:x86_64"],
    flag_values = {
        "@rules_python//python/config_settings:python_version_major_minor": "3.10.11",
    },
)

I am not sure if all of these permutations are really necessary, but the whole override idea is really nice.

Whilst the uv the uv_version is the main key where the configuration is built/modified. Here we could have the hub_name.

OK, that is it for now.

Apr 07 '25 01:04 aignas

One override case we wound up needing, which lead to https://github.com/bazel-contrib/rules_python/pull/2813, is that we have 2 separate pip hubs both for linux x86_64 that differ on other hardware constraints, specifically if there is a GPU or not. In this case we want to patch a wheel only in 1 of the cases because the dependency has different behavior on CPU vs GPU, so the patch is only valid in 1 of the hubs, but the wheel name is the same on both. This lead me to writing the linked PR and using it like this:

pip.override(
    file = "xgrammar-0.1.18-cp{version}-cp{version}-manylinux_2_17_x86_64.manylinux2014_x86_64.whl".format(version = DEFAULT_PYTHON_VERSION.replace(".", "")),
    hub_name = "gpu_deps",
    patch_strip = 1,
    patches = [
      ...
    ],
)

In previous cases I was able to sidestep this because the wheel name can potentially have the GPU specifics, as is the case for torch, so we defacto got the same behavior without the need for specifying hub_name.

This case could also be solved if use_extension(isolated=True) worked, but I saw in some other rules_python issues that it might not. I briefly tried it and it did work, but I'm not sure what later issues I would hit

Apr 24 '25 17:04 keith

The isolated = True should work I think, if it does not feel free to raise tickets separately.

Apr 24 '25 23:04 aignas

Another similar case I saw yesterday: jax tests with both numpy 1 and numpy 2. Having separate requirements.txt files would be fine (requirements.txt can't model this, so I don't see how to do it otherwise), but having separate pip hubs would be convoluted/confusing. Having a project-specific flag (like the cuda one mentioned in the second post) would be much easier.

Apr 25 '25 22:04 rickeylev

Some notes before I sign off:

I have created 3 PRs that thread through custom target setting passing. Thanks for the nudge @rickeylev. The 3rd PR is not that much and it seems to be pretty straightforward.
We still need to do some cleanup/simplification of the code so that stuff becomes more maintainable:
- select_whls should return 1 wheel that is most specialized for a given platform.
- The user should be able to customize the prioritize of which wheels are preferred by specifying a list like ["musllinux_*_x86_64", "manylinux_*_x86_64", "linux_*_x86_64", "any"].
- The whlname-specific config settings are not used anymore and we always use the target_platforms instead, which should match to the single most-specialized wheel returned by select_whls.
- Remove usage of whl_target_platforms - having the hardcoded list of CPUs is not configurable and it is hard to make it so.

Jun 15 '25 14:06 aignas

See a comment in https://github.com/bazel-contrib/rules_python/pull/3255#issuecomment-3292622784 for what me want to support in whl_platform_tags.

Sep 15 '25 14:09 aignas