RFC: poetry_install rule
I've written a poetry_install rule and would like to upstream.
High-level design
From my reading of https://github.com/soniaai/rules_poetry it runs pip to download and install individual wheels, and I don't understand why you would do it that way.
My proposal is that in a repository rule,
- figure out the path to the python interpreter
- http_archive to fetch poetry distro like https://github.com/python-poetry/poetry/releases/download/1.0.9/poetry-1.0.9-darwin.tar.gz
- stamp out a 10 line launcher similar to https://github.com/python-poetry/poetry/blob/1.0.9/get-poetry.py#L200-L218 so we don't have to "install" poetry, merely invoke it
- create the new repository directory and copy/symlink pyproject.toml and poetry.lock there
- simply run
python poetry_bin.py install --no-interaction --no-ansiin that working directory, with"POETRY_VIRTUALENVS_IN_PROJECT": "true",in the env - now you get `[output_base]/external/some_dir_deps/.venv/.../site-packages with all the stuff poetry gave you
- run a generate_build_files.py to populate the directory with
py_librarytargets. This program always runs after poetry so you can look at whatever you like on disk to mirror the dependency graph and sources into BUILD files
It's simple and does exactly the same thing poetry install does outside of Bazel. Here's the prototype code: https://gist.github.com/alexeagle/9fd6684e9306cf741f246dd3518a48ec
How it should look for users
In WORKSPACE you should:
load("@rules_python//poetry:install.bzl", "poetry_install")
poetry_install(
# free for users to select name
name = "some_dir_deps",
lockfile = "//some/dir:poetry.lock",
pyproject = "//some/dir:pyproject.toml",
# optionally tell what interpreter binary to use, otherwise it should default to
# whatever `toolchain_type = "@bazel_tools//tools/python:toolchain_type"` is in `register_toolchains`
python_interpreter = "@python_interpreter//:python_bin",
)
(it has no transitive dependencies)
That should create a @some_dir_deps repository with these labels:
-
@some_dir_deps//:all- a conveniencepy_libraryexposing all the dependencies, useful since there is no BUILD file generator and it matches existing semantics so it makes migration easier -
@some_dir_deps//pkg- apy_libraryproviding thepkgpackage
TODO: I need to figure out how "extras" should work - I think that's the reason pip makes you load a "requirements" helper function?
So you'd use it in your BUILD file:
py_library(
...
deps = ["@some_dir_deps//statsd"],
)
(no load statement is required)
Here's my plan:
- pick where it will go. The README already indicates that "The packaging rules (pip_import, etc.) are less stable. We may make breaking changes as they evolve." so my preference is a new top-level
poetryfolder - figure out the story for extras, native package dependencies
- make sure the install time is only once. For example on a CI setup we need to make sure
--repository_cacheis shared among workers (maybe requires better documentation on bazel.build) - add a first version with bazel-integration-test'ed example directory
I'm not a contributor, only a user of rules_python and rules_python_external.
My question(s): Why poetry and why this repo?
I'd personally prefer to see the existing rules improved for standard pip first, instead of the secondary tools like pipenv or poetry. There is good work in: https://github.com/dillon-giacoppo/rules_python_external that I would personally like to see upstreamed here instead.
If there is a desire for a "lockfile", it is perfectly valid to use a requirements.txt with all the hashes, produced by pip-tools.
See: https://hynek.me/articles/python-app-deps-2018/#pip-tools--everything-old-is-new-again
poetry and pipenv etc. all do end up using pip under the covers anyway and with the new dependency resolver coming to pip, I'm wondering if there won't be a flight back to simple old pip.
I'm not saying I'd be against poetry (I've looked deeply into it too), rather that I'd like to see consistent treatment of packaging tools. If the contributions you are proposing can't be upstreamed to rules_poetry, why should they be added to the official rules over others?
Of course, I'd be generally supportive of seeing rules_python centralise support for all of pip-tools, poetry or pipenv so the community isn't fragmented, but equally, these rules still carry some Google baggage, so the general push has been to create external rules.
IMO we should fix the Google baggage here, and make this a monorepo that houses as much high-quality, self-consistent stuff as the maintainer community has time to properly support. So yes to pip, poetry, pipenv, and other package managers, provided of course that we minimize the surface area to what's strictly needed.
My understanding is that poetry is the only package manager that pins your transitive dependencies. It's not sufficient to just give the hashes of direct dependencies - you can still get a non-reproducible build if a newer version of a transitive dep is published and satisfies the semver constraints of the direct dependencies. If there isn't a file exhaustively listing the entire set of transitive dependencies, then it's not a sufficient guarantee IIUC.
Anyway even if poetry were feature-compatible with other package managers, I think we should make the Bazel onramp less steep by supporting all common existing codebases including their dependency manifest files.
IMO we should fix the Google baggage here, and make this a monorepo that houses as much high-quality, self-consistent stuff as the maintainer community has time to properly support. So yes to pip, poetry, pipenv, and other package managers, provided of course that we minimize the surface area to what's strictly needed.
Yes, there have been recent movements which finally start to open this as a possibility. rules_python is a bit different to the other language rules due to the history of how it is used in core bazelbuild/bazel. It isn't completely standalone like other language rules unfortunately. Hence the core vs packaging split. Like I said, I too would probably prefer if there was a single set of "official" Python language rules for the language and all ecosystem packaging. I also understand the viewpoint of having distinct rules given that Bazel is so easily extensible with rules. However, I would prefer to see the "language native" tooling (pip) prioritised to the highest standard over the "secondary tools" like pipenv and poetry, particularly with the new resolver going into beta with pip 20.2
My understanding is that poetry is the only package manager that pins your transitive dependencies.
Probably off-topic to your RFC, but this is a common misconception. A fairly common way of "locking" a requirements.txt file is to use python -m piptools compile --generate-hashes requirements.in which produces a fully hashed/locked transitive closure. Before pipenv and poetry existed, all one needed was venv, pip-tools and pip. Now one can also use pipenv lock -r or poetry export to get a requirements.txt with the transitive dependency hashes.
Additional thought:
If it is decided to consolidate support for the various combinations of Python dependency management, package management and project management tools and specifications that are used in the ecosystem such as pip, pipenv, poetry, Pipfile, pyproject.yaml, requirements.txt etc. There would probably also need to be some criteria and documented levels of support for accepting the tool or contribution. For a hypothetical example would conda be accepted? I'd be wary of the rules being pulled into the quagmire that Python packaging/dependency management is often renowned for.
Finally:
Good discussion. I'm only an interested party. Will be good to hear what the maintainers and others think.
Probably off-topic to your RFC, but this is a common misconception. A fairly common way of "locking" a requirements.txt file is to use python -m piptools compile --generate-hashes requirements.in which produces a fully hashed/locked transitive closure. Before pipenv and poetry existed, all one needed was venv, pip-tools and pip. Now one can also use pipenv lock -r or poetry export to get a requirements.txt with the transitive dependency hashes.
Ooh this is interesting to clear up! But pip-tools isn't part of the python distribution right? You'd still need to get that on your machine as a bootstrap step. Also it's not maintained by python core so it doesn't seem obviously more canonical than pipenv or poetry, it just predates them right?
pipenv actually vendored a bunch of the pip-tools code I think. So yeah it pre-dates.
We used to use Pipenv with Bazel but abandoned it in favour of pip-tools because the former's locking behaviour wasn't good enough for us.
So far I like that pip-tools does one thing, transitive locking, and does it well. It also means we keep our locked requirements 'out in the open' outside of Bazel's internals, which is nice when managing platform-dependent behaviour in Python's packaging ecosystem.
Platform-dependent behaviour is my biggest reservation with the above. Delegating the locking to Poetry means you have low visibility on any platform inconsistency that arises. With pip-tools with have CI checks on the lock file that can detect platform inconsistency early.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
@alexeagle
Hi I'm looking to implement poetry with bazel workflow as well. Have you found a good solution? Thanks!
@rlam3 nope, you'd have to use the prototype code I linked at the top. Alternatively maybe you can transform your poetry lock file into a fully-locked requirements file and use the pip_import rule
I've created the following genrule which converts my poetry.lock into a requirements_lock.txt for use with pip_parse and it works great!
genrule(
name = "poetry_to_requirements",
srcs = ["pyproject.toml", "poetry.lock"],
outs = ["requirements_lock.txt"],
cmd = "cp $(SRCS) $(RULEDIR) && cd $(RULEDIR) && poetry export --output requirements_lock.txt"
)
It's similar in function to compile_pip_requirements. The major downside of both of these rules are that they need to be run manually and the results committed. This makes sense in the case of compile_pip_requirements since that rule is actually performing the freezing of transitive dependency versions which could change between executions, making it non-reproducible, but in the case of the above genrule, the output requirements_lock.txt will always be the same as long as the pyproject.toml and poetry.lock don't change.
In theory, one could modify pip_parse to accept a poetry_lock argument which would perform this step automatically, right? I might try to take a crack at it.
Yeah I agree if the poetry.lock provides all the needed info, it should be possible to consume it directly in the repository rule. Maybe this can already be done in userland though? Make your own repository rule which produces the requirements_lock.txt and writes to some @from_poetry//:requirements_lock.txt external repo, such that you could pass that label to the existing pip_parse rule?
Building poetry support into rules_python ought to include a hermetic toolchain for fetching poetry, so there's some work to do.
Repository rule is a good approach, I'll try to tackle it that way. My plan was to actually pull poetry-core from PyPi rather than the full CLI. The export command now exists as a plugin module, so you just need an instantiated Poetry object (provided by poetry-core to be able to generate a requirements_lock.txt. So this should be possible to do in pure Python, no CLI calls.
I've got a working POC of a poetry repository rule which generates a requirements lock file here: https://github.com/AndrewGuenther/rules_python_poetry_poc
It currently defers to poetry on the path and I'm working on fetching the poetry export plugin now. The biggest gotcha I can see going forward with this is that the poetry lock file is not a standard format and the maintainers say it can change at any time. Not sure if there's really any way to deal with that, but it hasn't seen a breaking change as far as I'm aware.
Once I've got a toolchain for poetry, I'll write some more tests and work on merging it here. Sound good?
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
@jvolkman was looking into poetry as well, since it has a cross-platform lockfile which is nicer
Indeed. I've been using poetry as part of https://github.com/jvolkman/rules_pycross, albeit in an indirect manner.
Want to just throw out here that we've been using https://github.com/AndrewGuenther/rules_python_poetry_poc for over six months now and it has worked very well. I never got around to creating a poetry toolchain, but for anyone looking for a solution, I'd say it is out of "poc" territory now.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
Any new updates here?
@jrroman I've built https://github.com/AndrewGuenther/rules_python_poetry and it has been working great at our company. It is a really thin wrapper around rules_python and just calls poetry export to generate a requirements.txt to feed into rules_python.
@AndrewGuenther that's very similar to what we did in JavaScript land with https://docs.aspect.build/rules/aspect_rules_js/docs/pnpm#update_pnpm_lock. In that case, the user supplies a lockfile from one package manager (yarn or npm) and we run a "pnpm import" to create a lockfile in the format our Bazel rules want.
It seems to me that your approach could easily be upstreamed into rules_python and we could close this issue.
@alexeagle The main thing missing from my implementation and is a bit outside of my Bazel expertise, would be setting up a toolchain for poetry to make it fully hermetic. If you would accept a PR with the implementation as is plus some additional docs, I'd be happy to contribute.
I started working on fetching poetry as a hermetic repository rule. Will update if I get it in a good shape.
I made it hermetic, see new README: https://github.com/bazelbuild/rules_python/compare/poetry
Still working to glue that together with your simple repo rule in AndrewGuenther/rules_python_poetry
@alexeagle This looks great! Anything I can do to help this get pulled into rules_python?
Yeah, this lets you run poetry in an action, but not a repository rule. It could be an input to write_source_files to vendor the resulting lock files into your repository. I just haven't had time/funding to carry this forward further yet.
bump
Sorry, I don't have any funding to work on this.