A potential manual solution to Monorepos with Path Dependencies
Hi @ofek, thanks for the great tool!
I've come up with a solution proposal to a known problem I'm facing, and I would like to know your thoughts on it and if it's possible to implement it on Pyapp
I've tried to be as detailed as possible, and I'm open to any questions or suggestions, thanks!
Update: It worked!
I've hidden the R&D Process on this comment to only what matters below:
Context, Research and Development old Comment (Expand)
Context
I have a pretty standard Monorepo structure that provides a main package for all the other Projects but don't know them
A project refers to the monorepo using path dependencies, and they even might refer to other projects as well
Note: I'm doing this to separate dependencies e.g. not all projects need pytorch
It all works nicely under development mode.. until I want to build a release of any project
In the past, I have implemented convoluted solutions using Pyinstaller or Nuitka which ended up working to a certain extent but wasn't ideal (long story), so I decided to give Pyapp a try
Problem
As I saw somewhere, Python wheels aren't standardized for path dependenciesyet?, so whenever building a pyproject.toml, the wheel won't be installable on other machines as the builder's local path is hardcoded
I don't really want to upload the code to PyPI as it is very specific to my use case, much like other monorepo opinions; even if that was the use case, Poetry can't have a versioned dependency on the main section and a path dependency on the dev section simultaneously of the same package
Ultimately, this yields either spaghetti solutions or the lack of it
What I have tried
I've spent two days of intensive digging through the documentation and issues everywhere, trying many build backends such as Poetry, Hatch, PDM and proposed Poetry plugins or solutions, but ultimately I couldn't get it to work. Raw Pyapp was the closest I gotI know it's used in Hatch!
Attempt 1: Source distribution
I honestly don't remember much of what I tried yesterday, but I can say this wasn't ideal as including the packages as sdist isn't "safe" and annoying to define the glob imports, also the monorepo package isn't on the subpath of the projects, and Poetry fails
Attempt 2: Custom distribution
Long story short, I zipped the Poetry's Virtual Environment and set the proper relative paths on Pyapp variables for the executables, and used the full isolated mode, skip install.
It fails as the Python included there is a symlink to system Python. Setting poetry.virtualenvs.options.always-copy to true didn't do it as well(Consider this as a bug report? To embed some proper Python distribution on top of a local one?)
I'm not a fan of this solution as yours fetching and installation of the Python distribution feels more reliable and arguably universal
Attempt 3: Hatch
I ported the pyprojects.toml to Hatch syntax and force-included the main package Broken under ../../Broken to the wheel. The embedding I'm using is PYAPP_PROJECT_PATH as the built wheel
This failed as I didn't "inherit" the dependencies of the main package, the Virtual Environment contained properly ShaderFlow and Broken package, but not the dependencies of Broken (the monorepo root's package)
This solution feels non ideal as I had to unset safety flags on Hatch, like the allow direct references and, well, including some other package on the wheel
Proposed solution
After all the digging, I think this could be solved by the following:
-
Have the path dependencies as
dev-dependencieson thepyproject.tomlof the project:
[tool.poetry.dependencies]
python = ">=3.10,<3.13"
moderngl = "^5.8.2"
# ...
[tool.poetry.dev-dependencies]
broken = {path="../../", develop=true}
Building a wheel for this project won't include the broken package, but it's ok
- Find all path dependencies and build their wheel, recursively
This isn't something you can implement on Pyapp, but a process users would need to define on their own
A pseudo code / implementation would be something like this (I didn't run nor test the logic):
from pathlib import Path
from dotmap import DotMap
import toml
def build_projects(path: Path, found: Set[Path]=None):
path = Path(path).resolve()
# Initialize empty set
found = found or set()
# Skip if already found
if path in found:
return
# Skip if no pyproject.toml exists
if not (path/"pyproject.toml").exists():
return
# Load pyproject.toml dictionary
pyproject = DotMap(toml.reads((path/"pyproject.toml").read_text()))
# Iterate and find all path dependencies
for name, dependency in pyproject.tool.poetry["dev-dependencies"].items():
# Find only path= dictionaries
if isinstance(data, str):
continue
if not dependency.path:
continue
# Dependency is a path
dependency = Path(data.path).resolve()
found.add(dependency)
# Build the wheel
with pushd(dependency):
subprocess.run(["poetry", "build", "--format", "wheel"])
# Recursively find wheels
wheels(dependency, found)
return found
# Build all wheels
projects = build_projects(Path.cwd())
# We can now get the wheels from the projects
wheels = [next(project.glob("dist/*.whl")) for project in projects]
Why we need all of this? For the next step and the proposed solution
- Include all the built wheels as a installation dependency on Pyapp
We still use the main project's wheel on PYAPP_PROJECT_PATH settings when building, and we would include the other wheels on a new PYAPP_LOCAL_DEPENDENCIES setting or other name in your preference
# Include other local dependencies on the building step
os.environ["PYAPP_LOCAL_DEPENDENCIES"] = ":".join(wheels)
# Compile the project
subprocess.run(["pyapp", ...])
When Pyapp is installing the Virtual Environment, it would install the main project's wheel and the other local wheels as well
Why would it work?
By using the Path Dependencies projects as development dependencies, we:
- Have they in editable mode when developing
- Don't include the hard-coded path on the wheel;
- Include all the standard versioned dependencies it uses
By installing the wheel of the main project and all other Path Dependencies wheels, we:
- Would have all the standard dependencies installed on the Virtual Environment;
- Plus the other local packages's code;
- And their metadata for
importlib
My intuition says that this would work very great !
• So, (...)
I had the smart-dumb-est idea to build all the path dependencies to .whl, then move these .whls inside a Resources folder on the target project to compile, that gets imported and used by importlib.resources.files, so that when building the target project to .whl, all the path dependencies .whls are inside the built .whl
At runtime, if on a pyapp release, we iterate on all .whls on the resources and install them with pip
that's too many wheels 🧀
• Proof of concept code
You see the build and wheel-embedding function in this file, ~~and the hacky code to install all the wheels at runtime can be seen on the project's __init__.py file~~ Update: I permalinked old code that wasn't elegant, I've removed on recent commits
• It worked !
I've run myself on Linux and Windows, and asked friends on both systems to test the built pyapp binaries
I can confirm everything worked very well !
We all could run, render videos, load pytorch (even with cuda!), directories and package metadata are nominal 🎉
I've found some time to attempt a proof of concept implementation in PyApp itself,
I've forked the repo and fixed my releases functions on the Monorepo, might change or tweak stuff still
I'll probably not PR this on my own, as there's some safety and panic concerns in Rust on my implementation. I only coded a couple months into the language before and there's also nuances of your code you can do a much better job :) !
I mostly eyeballed what PYAPP_PROJECT_PATH was doing and tried the same embedding and bootstrap
I'm closing this as a (very decent) workaround with hatch to build a single wheel that bundles everything is to do:
# Use a single venv for all projects, and "import alpha" works
# Note that each of those directories contains an mostly empty pyproject.toml
# just make sure they also use hatchling, are managed, and includes your package
[tool.rye.workspace]
members = [
"projects/alpha",
"projects/beta",
# ...
]
# Include all project packages with the __init__.py on the wheel as is
# Note: We aren't "building" them, just bundling the source code
[tool.hatch.build.targets.wheel]
only-include = [
"shared_library",
"projects/alpha/alpha",
"projects/beta/beta",
# ...
]
# Rewrite paths on the wheel so they are plain - "import alpha" works,
# instead of "import projects.alpha.alpha"
[tool.hatch.build.targets.wheel.sources]
"projects/alpha/alpha" = "alpha"
"projects/alpha/beta" = "beta"
# ...
Then, build a local wheel or upload to PyPI and compile normally with PyApp. I find that using PYAPP_PROJECT_PATH=str(wheel) built by rye build and setting PYAPP_EXEC_SPEC=f"alpha.__main__:main" works the best for me. Also, PYAPP_UV_ENABLED=1 is so fast 😉
I have a pypi wheel of my projects done this way as a proof of concept
There's a side issue with this, no project library will contain its spec file, so importlib.metadata on any of alpha, beta, ... will fail (can get metadata from the shared lib alone, which I am doing). Plus, the venv will be the same for all projects, which isn't technically an issue, unless you depend on some custom resource file per-binary (probably solvable by hard-coding envs), or some code updates with a same-version wheel
I hope this helps anyone for future reference