pyapp A potential manual solution to Monorepos with Path Dependencies

Hi @ofek, thanks for the great tool!

I've come up with a solution proposal to a known problem I'm facing, and I would like to know your thoughts on it and if it's possible to implement it on Pyapp

I've tried to be as detailed as possible, and I'm open to any questions or suggestions, thanks!

Update: It worked!

I've hidden the R&D Process on this comment to only what matters below:

Context, Research and Development old Comment (Expand)

Context

I have a pretty standard Monorepo structure that provides a main package for all the other Projects but don't know them

A project refers to the monorepo using path dependencies, and they even might refer to other projects as well

_{Note: I'm doing this to separate dependencies e.g. not all projects need pytorch}

It all works nicely under development mode.. until I want to build a release of any project

In the past, I have implemented convoluted solutions using Pyinstaller or Nuitka which ended up working to a certain extent but wasn't ideal (long story), so I decided to give Pyapp a try

Problem

As I saw somewhere, Python wheels aren't standardized for path dependencies^yet?, so whenever building a pyproject.toml, the wheel won't be installable on other machines as the builder's local path is hardcoded

I don't really want to upload the code to PyPI as it is very specific to my use case, much like other monorepo opinions; even if that was the use case, Poetry can't have a versioned dependency on the main section and a path dependency on the dev section simultaneously of the same package

Ultimately, this yields either spaghetti solutions or the lack of it

What I have tried

I've spent two days of intensive digging through the documentation and issues everywhere, trying many build backends such as Poetry, Hatch, PDM and proposed Poetry plugins or solutions, but ultimately I couldn't get it to work. Raw Pyapp was the closest I got^{I know it's used in Hatch!}

Attempt 1: Source distribution

I honestly don't remember much of what I tried yesterday, but I can say this wasn't ideal as including the packages as sdist isn't "safe" and annoying to define the glob imports, also the monorepo package isn't on the subpath of the projects, and Poetry fails

Attempt 2: Custom distribution

Long story short, I zipped the Poetry's Virtual Environment and set the proper relative paths on Pyapp variables for the executables, and used the full isolated mode, skip install.

It fails as the Python included there is a symlink to system Python. Setting poetry.virtualenvs.options.always-copy to true didn't do it as well^{(Consider this as a bug report? To embed some proper Python distribution on top of a local one?)}

I'm not a fan of this solution as yours fetching and installation of the Python distribution feels more reliable and arguably universal

Attempt 3: Hatch

I ported the pyprojects.toml to Hatch syntax and force-included the main package Broken under ../../Broken to the wheel. The embedding I'm using is PYAPP_PROJECT_PATH as the built wheel

This failed as I didn't "inherit" the dependencies of the main package, the Virtual Environment contained properly ShaderFlow and Broken package, but not the dependencies of Broken (the monorepo root's package)

This solution feels non ideal as I had to unset safety flags on Hatch, like the allow direct references and, well, including some other package on the wheel

Proposed solution

After all the digging, I think this could be solved by the following:

Have the path dependencies as dev-dependencies on the pyproject.toml of the project:

[tool.poetry.dependencies]
python   = ">=3.10,<3.13"
moderngl = "^5.8.2"
# ...

[tool.poetry.dev-dependencies]
broken = {path="../../", develop=true}

Building a wheel for this project won't include the broken package, but it's ok

Find all path dependencies and build their wheel, recursively

This isn't something you can implement on Pyapp, but a process users would need to define on their own

A pseudo code / implementation would be something like this (I didn't run nor test the logic):

from pathlib import Path
from dotmap import DotMap
import toml

def build_projects(path: Path, found: Set[Path]=None):
    path = Path(path).resolve()

    # Initialize empty set
    found = found or set()

    # Skip if already found
    if path in found:
        return

    # Skip if no pyproject.toml exists
    if not (path/"pyproject.toml").exists():
        return

    # Load pyproject.toml dictionary
    pyproject = DotMap(toml.reads((path/"pyproject.toml").read_text()))

    # Iterate and find all path dependencies
    for name, dependency in pyproject.tool.poetry["dev-dependencies"].items():

        # Find only path= dictionaries
        if isinstance(data, str):
            continue
        if not dependency.path:
            continue

        # Dependency is a path
        dependency = Path(data.path).resolve()
        found.add(dependency)

        # Build the wheel
        with pushd(dependency):
            subprocess.run(["poetry", "build", "--format", "wheel"])

        # Recursively find wheels
        wheels(dependency, found)

    return found

# Build all wheels
projects = build_projects(Path.cwd())

# We can now get the wheels from the projects
wheels = [next(project.glob("dist/*.whl")) for project in projects]

Why we need all of this? For the next step and the proposed solution

Include all the built wheels as a installation dependency on Pyapp

We still use the main project's wheel on PYAPP_PROJECT_PATH settings when building, and we would include the other wheels on a new PYAPP_LOCAL_DEPENDENCIES setting or other name in your preference

# Include other local dependencies on the building step
os.environ["PYAPP_LOCAL_DEPENDENCIES"] = ":".join(wheels)

# Compile the project
subprocess.run(["pyapp", ...])

When Pyapp is installing the Virtual Environment, it would install the main project's wheel and the other local wheels as well

Why would it work?

By using the Path Dependencies projects as development dependencies, we:

Have they in editable mode when developing
Don't include the hard-coded path on the wheel;
Include all the standard versioned dependencies it uses

By installing the wheel of the main project and all other Path Dependencies wheels, we:

Would have all the standard dependencies installed on the Virtual Environment;
Plus the other local packages's code;
And their metadata for importlib

My intuition says that this would work very great !

Jan 10 '24 00:01 Tremeschin

• So, (...)

I had the smart-dumb-est idea to build all the path dependencies to .whl, then move these .whls inside a Resources folder on the target project to compile, that gets imported and used by importlib.resources.files, so that when building the target project to .whl, all the path dependencies .whls are inside the built .whl

At runtime, if on a pyapp release, we iterate on all .whls on the resources and install them with pip

_{that's too many wheels 🧀}

• Proof of concept code

You see the build and wheel-embedding function in this file, ~~and the hacky code to install all the wheels at runtime can be seen on the project's __init__.py file~~ Update: I permalinked old code that wasn't elegant, I've removed on recent commits

• It worked !

I've run myself on Linux and Windows, and asked friends on both systems to test the built pyapp binaries

I can confirm everything worked very well !

We all could run, render videos, load pytorch (even with cuda!), directories and package metadata are nominal 🎉

Jan 11 '24 01:01 Tremeschin

I've found some time to attempt a proof of concept implementation in PyApp itself,

I've forked the repo and fixed my releases functions on the Monorepo, might change or tweak stuff still

I'll probably not PR this on my own, as there's some safety and panic concerns in Rust on my implementation. I only coded a couple months into the language before and there's also nuances of your code you can do a much better job :) !

I mostly eyeballed what PYAPP_PROJECT_PATH was doing and tried the same embedding and bootstrap

Mar 14 '24 23:03 Tremeschin

I'm closing this as a (very decent) workaround with hatch to build a single wheel that bundles everything is to do:

# Use a single venv for all projects, and "import alpha" works
# Note that each of those directories contains an mostly empty pyproject.toml
# just make sure they also use hatchling, are managed, and includes your package
[tool.rye.workspace]
members = [
    "projects/alpha",
    "projects/beta",
    # ...
]

# Include all project packages with the __init__.py on the wheel as is
# Note: We aren't "building" them, just bundling the source code
[tool.hatch.build.targets.wheel]
only-include = [
    "shared_library",
    "projects/alpha/alpha",
    "projects/beta/beta",
    # ...
]

# Rewrite paths on the wheel so they are plain - "import alpha" works,
# instead of "import projects.alpha.alpha"
[tool.hatch.build.targets.wheel.sources]
"projects/alpha/alpha" = "alpha"
"projects/alpha/beta"  = "beta"
# ...

Then, build a local wheel or upload to PyPI and compile normally with PyApp. I find that using PYAPP_PROJECT_PATH=str(wheel) built by rye build and setting PYAPP_EXEC_SPEC=f"alpha.__main__:main" works the best for me. Also, PYAPP_UV_ENABLED=1 is so fast 😉

I have a pypi wheel of my projects done this way as a proof of concept

There's a side issue with this, no project library will contain its spec file, so importlib.metadata on any of alpha, beta, ... will fail (can get metadata from the shared lib alone, which I am doing). Plus, the venv will be the same for all projects, which isn't technically an issue, unless you depend on some custom resource file per-binary (probably solvable by hard-coding envs), or some code updates with a same-version wheel

I hope this helps anyone for future reference

May 25 '24 19:05 Tremeschin