Better documentation and CLI help for `pdm install/lock/update/add` semantics

Open palotasb opened this issue 2 years ago • 1 comments

Our issue: Inadequate understanding of how PDM works

We're getting started with PDM at my company, switching over from pip freeze >requirements.txt, Poetry, and other solutions. I'm already a big fan of PDM after only using it for a few days because to me it seems like it's the only Python dependency management tool that does things correctly – working with standard pyproject.toml keys, resolving dependencies correctly, defaulting to platform- and Python-version-independent lockfiles, etc. Really great work @frostming and team!

However both while I was converting our project to now use PDM and while my teammates were onboarding to using the new PDM-based tooling, we realized that we don't have a full and correct understand of all of the semantics of various pdm subcommands and how they interact with pyproject.toml, pdm.lock, various dependenciy groups (both in pyproject.toml and pdm.lock!), and the list of actual packages installed, and what happens if any of these are out of sync.

We specifically hit issues #2124 / #2253 and didn't really understand what is causing the [PdmUsageError]: Requested groups not in lockfile errors and what we should do differently to solve the error.

We were adding new dependencies and dependency groups manually to pyproject.toml and running pdm install or pdm install -G :all, but that failed with: [PdmUsageError]: Requested groups not in lockfile. #2124 and #2253 are previously reported issues about this scenario.

Proposed solution: More explicit documentation

More explicit reference docs

I would like if the CLI Reference and the --help stated for each command how exactly it is reading and/or changing each one of these:

any packages or groups passed as command line arguments,
the project's pyproject.toml,
the lockfile pdm.lock,
- the metadata.{groups, strategy} already present in the lockfile (if relevant), and
the actually installed Python packages.

As an example, the current documentation for pdm install says:

Install dependencies from lock file

I would propose to expand so that the docs cover the full semantics of this command:

It should be mentioned that if there is no lockfile, pdm install will create one.
It should mention how dependencies listed in the lockfile but not in pyproject.toml are handled – and vice versa.
It should mention how already installed but unlisted packages are handled.
It should be mentioned that while groups via -G may refer to any group in pyproject.toml, and -G :all includes all groups from pyproject.toml, pdm install will only successfully install groups listed in the lockfile's metadata.groups section. (I actually think this behavior could also be enhanced, but I'll file a separate issue for that. This issue is just about very explicitly documenting actual behavior and semantics.)

Because the above points are not listed, we were a bit confused about how the install/update/lock commands compare and interact.

Similar things could be said for other commands. We spent quite a but of time looking at the CLI reference for pdm sync, pdm install, pdm add, pdm lock trying to understand what was happening and why our mental model and expectations didn't match actual behavior.

An overview of PDM semantics/concepts

I would also like to propose that PDM adds a documentation page that explains PDM's mental model / semantics / core concepts (whatever you prefer to call it).

For example, PDM has the internal concepts of "groups added to the lockfile" (pdm.lock: metadata.groups) and "strategy defined in the lockfile" (metadata.strategy). One can figure out that these concepts exist if they start digging or experimenting, like we did, but it would be better if this were explained in the documentation. The reason these two concepts are interesting is that they make a PDM lockfile stateful in the sense that running a pdm <command> that updates the lockfile results in a lockfile that is not just a function of pyproject.toml + the command being run, but it also depends on the metadata already present in the lockfile. (Which is not what we naively expected.)

So as a second part of the proposed solution, I think PDM should document it's own semantics and explain how the various commands work in terms of these semantics. Perhaps you consider some of these as internal implementation details (e.g., the metadata inside the lockfile), but I would argue that they should be publicly documented because they affect the observable behaviour of PDM.

Contribution

Sorry for the wall of text! Overall we're still very happy with PDM and we hope to use it more!

I'm also open to contributing docs myself if that helps.

Mar 12 '24 16:03 palotasb

@palotasb Great idea!

Mar 18 '24 17:03 johnkangw