rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

[Question] Can pip.override patch an internal dependency?

Open dougthor42 opened this issue 1 year ago • 4 comments

Can pip.override patch an internal rules_python dependency?

Specifically, this installer PR https://github.com/pypa/installer/pull/216 adds support for a overwrite_existing argument to SchemeDictionaryDestination and it seems to be stalled. It's a blocker for us rolling out Bazel.

I have patched python/pip_install/tools/wheel_installer/wheel.py to use the new arg, and during all my local testing have just been manually editing ~/.cache/bazel/_bazel_dthor/<hash>/external/rules_python~~internal_deps~pypi__installer/installer/destinations.py to get things to work. Obviously that's not suitable for long term haha.

My naive attempt is:

# MODULE.bazel
# Use a pre-release version of rules_python and the gazelle plugin. This mostly
# provides additional gazelle directives, but also allows the use of the bazel
# downloader for pip packages, which greatly increases caching.
git_override(
    module_name = "rules_python",
    commit = "55f31a306c8d69b46e50a1b2618c93081d4e11e8",
    patch_strip = 1,  # Remove the "a/" and "b/" prefixes from paths in the patch file.
    # Patch rules_python because qtconsole gets created multiple times.
    # Later on we have to patch `installer` similarly.
    # See https://github.com/pypa/installer/pull/216 and
    # https://github.com/dougthor42/rules_python/tree/doug
    patches = [
        "tools/bazel/patches/allow_overwriting_qtconsole.patch",
    ],
    remote = "https://github.com/bazelbuild/rules_python",
)

# Patch 'installer' for https://github.com/pypa/installer/pull/216
pip.override(
    file = "installer-0.7.0-py3-none-any.whl",
    patch_strip = 2,
    patches = [
        "tools/bazel/patches/installer_gh216.patch",
    ],
)

The git_override works just fine. The pip.override does not.

Background

Without https://github.com/pypa/installer/pull/216 and the patch to rules_python/.../wheel.py, I get this error:

===== stderr start =====
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/pip_install/tools/wheel_installer/wheel_installer.py", line 205, in <module>
    main()
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/pip_install/tools/wheel_installer/wheel_installer.py", line 160, in main
    _extract_wheel(
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/pip_install/tools/wheel_installer/wheel_installer.py", line 121, in _extract_wheel
    whl.unzip(installation_dir)
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~/python/pip_install/tools/wheel_installer/wheel.py", line 619, in unzip
    installer.install(
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~~internal_deps~pypi__installer/installer/_core.py", line 109, in install
    record = destination.write_file(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~~internal_deps~pypi__installer/installer/destinations.py", line 205, in write_file
    return self.write_to_fs(
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~~internal_deps~pypi__installer/installer/destinations.py", line 169, in write_to_fs
    raise FileExistsError(message)
FileExistsError: File already exists: ./bin/jupyter-qtconsole
===== stderr end =====

dougthor42 avatar May 16 '24 18:05 dougthor42

This patching should be done in the invocations to the http_archive in python/pip_install/repositories.bzl in order work properly. We are not downloading the installer via the pip extension because the extension depends on the installer library to work. And we are reusing the old code from the WORKSPACE times that was structured like this.

Is this PR in the installer something that rules_python should have? Why do you need the patch?

If you would like to just use the installer with a patch, you could setup your own requirements.txt file and use the pip.override to patch it, but I assume that this is not something that you can do easily since you opened this issue.

aignas avatar May 17 '24 01:05 aignas

Nah it's not that our code uses installer. It's not a dependency nor transitive dependency of anything we have in requirements.in. If we were using installer, you're right it would be pretty easy to fix.

I'll try to make a more comprehensive minimal reproduction, but the summary is:

  1. make a requirements.in that includes jupyter==1.0.0. This is a python metapackage that installs "the Jupyter system, including the notebook, qtconsole, and the IPython kernel."
  2. update requirements lockfile.
  3. update manifest
  4. add qtconsole==5.3.0, even though jupyter includes qtconsole as a dependency.
    • This is a lower version than was resolved by the dep resolver in step 2.
  5. update requirements lockfile
  6. update manifest. At this point bazel errors out with the logs in the OP.

dougthor42 avatar May 17 '24 03:05 dougthor42

Oh, and here are the two patches that I use to get things working:

diff --git a/python/pip_install/tools/wheel_installer/wheel.py b/python/pip_install/tools/wheel_installer/wheel.py
index 750ebfcf..51f1a286 100644
--- a/python/pip_install/tools/wheel_installer/wheel.py
+++ b/python/pip_install/tools/wheel_installer/wheel.py
@@ -606,6 +606,10 @@ class Wheel:
             "scripts": "/bin",
             "data": "/data",
         }
+
+        # Hack
+        overwrite = "qtconsole" in str(self.path)
+
         destination = installer.destinations.SchemeDictionaryDestination(
             installation_schemes,
             # TODO Should entry_point scripts also be handled by installer rather than custom code?
@@ -613,6 +617,7 @@ class Wheel:
             script_kind="posix",
             destdir=directory,
             bytecode_optimization_levels=[],
+            overwrite_existing = overwrite,
         )
 
         with installer.sources.WheelFile.open(self.path) as wheel_source:
/usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/external/rules_python~~internal_deps~pypi__installer/installer/destinations.py
         hash_algorithm: str = "sha256",
         bytecode_optimization_levels: Collection[int] = (),
         destdir: Optional[str] = None,
+        overwrite_existing: bool = False,
     ) -> None:
         """Construct a ``SchemeDictionaryDestination`` object.
@@ -128,13 +129,15 @@ def __init__(
         :param destdir: A staging directory in which to write all files. This
             is expected to be the filesystem root at runtime, so embedded paths
             will be written as though this was the root.
         :param overwrite_existing: silently overwrite existing files.
         """
         self.scheme_dict = scheme_dict
         self.interpreter = interpreter
         self.script_kind = script_kind
         self.hash_algorithm = hash_algorithm
         self.bytecode_optimization_levels = bytecode_optimization_levels
         self.destdir = destdir
+        self.overwrite_existing = overwrite_existing

     def _path_with_destdir(self, scheme: Scheme, path: str) -> str:
         file = os.path.join(self.scheme_dict[scheme], path)
@@ -162,7 +165,7 @@ def write_to_fs(
         - Hashes the written content, to determine the entry in the ``RECORD`` file.
         """
         target_path = self._path_with_destdir(scheme, path)
-        if os.path.exists(target_path):
+        if not self.overwrite_existing and os.path.exists(target_path):
         message = f"File already exists: {target_path}"
         raise FileExistsError(message)

dougthor42 avatar May 17 '24 06:05 dougthor42

I think the issue is that jupyter is a metapackage, meaning it installs a bunch of other ill-defined packages. Running pip install jupyter=1.0.0 today may install a different version of, say, jupyter-core or qtconsole than yesterday.

jupyter==1.0.0 gets added to the lock file. rules_python installs this and eventually makes ./bin/jupyter-qtconsole. Then rules_python gets to qtconsole in the lock file, which is a different version, and tries to install it again. The installer dependency barfs because ./bin/jupyter-qtconsole already exists.

At least that's my theory.

If you add qtconsole==5.5.2 instead of 5.3.0, rules_python doesn't complain.


So all that to say: there's a workaround and I no longer need to patch rules_python or the internal deps. I just have to remove the jupyter metapackage from my project's requirements.in.

That said, having the ability to patch an internal dep may be useful elsewhere. It's just a much lower priority for me now :laughing:.

dougthor42 avatar May 17 '24 22:05 dougthor42

I would be still interested in getting to the root cause of this.

I thought that whenever we setup a new package, we are doing this in a clean directory and I am not sure about why we would be reusing the directory for installs of packages. I'll leave this open for the time being as there are 2 ways to resolve this:

  • Include the patch in rules_python because the installer dep could be better.
  • Close as won't do as the users can patch the MODULE.bazel or the codebase of the rules_python itself to modify the installer http_archive call to include patches. Maybe if we wanted this to be easier for the users, we could add another field in the tuple for patches.

aignas avatar May 19 '24 03:05 aignas

Ah ok, so nested patching is one option. I didn't think of that - it sounds fun, haha. I think that option (2) is just fine. It should be rare that we have to do such a thing.

I did more digging. An even more minimal reproduction is to have only qtconsole in the requirements.in file. qtconsole <= 5.3.0 fail, while > 5.3.0 are fine.

The bug is https://github.com/jupyter/qtconsole/issues/529. The fix is https://github.com/jupyter/qtconsole/commit/8179ac8c696db2b3ea4df59f44095280a1a059ff which was part of their 5.3.1 release.

So it wasn't an issue with rules_python at all. Huzzah!

If you want to play around more, you can use https://github.com/dougthor42/rules_python_gh1906.

dougthor42 avatar May 19 '24 05:05 dougthor42

Thanks for digging this a little more. I'll close this as it is an upstream issue. If othecs feel strongly about including the patch in rules_python, let me know.

aignas avatar May 19 '24 14:05 aignas