Imprecise information on the `Challenges using setuptools` section.
Hello, this is a follow up on https://discuss.python.org/t/python-packaging-documentation-feedback-and-discussion/24833/78.
I believe that there are some imprecisions in the section: https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-build-tools.html#challenges-using-setuptools
For example:
setuptools will build a project without a name or version if you are not using a pyproject.toml file to store metadata.
I don't know if I am understanding this correctly, but setuptools can derive the name/version information from any of the configuration files setup.py, setup.cfg or pyproject.toml. I am not sure why that would be problematic...
Setuptools also will include all of the files in your package repository if you do not explicitly tell it to exclude files using a MANIFEST.in file
By default setuptools will add to the distribution a subset of files that do not correspond to all files in the package repository. However we do recommend users to use a plugin like setuptools-scm so the VCS system can be used as the single source of information. With setuptools-scm the approach should be very similar to what hatch does (I believe it tries to parse .gitignore but I might be wrong, or flit(when invoked as theflit` CLI at least), and probably other backends.
My personal opinion is that MANIFEST.in is only needed if you want a high degree of customization and/or are not happy with using VCS (e.g. there are people that believe that disagree on a conceptual level with using VCS info for builds)
There is some information about it on https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html.
hi @abravalheri !! thank you for this issue - so I think what surprised me when i built a package is by default, setuptools included the docs directory and some css files. so it seemed to include more than i was expecting in the sdist. If i recall correctly this was particularly true if i build locally it was also including the documentation build files (html and raw files). whereas in our clean github build it seems to work as expected - the sdist on pypi is not bad - it only includes the docs/ dir
This made me think we should always use manifest with setuptools given the default behavi or of adding more than you might want in the sdist? for instance should we include markdown and rst documentation files in an sdist? Below is a screenshot - taken from a local build of stravalib which i've been testing this one. Notice that ALL files in the repo are included in the sdist. i worry a user would not include a manifest and have bloated sdist files like this. Then that burdens PyPI / warehouse as storage over time increases.
I think the name issue relates to setuptools doesn't check for a project name if you are using a setup.py or .cfg file to store metadata. it does check the name in the pyproject.toml file. . this was a bug that someone told me about but because i don't use setup.py i don't have a project to test this. are you saying that is setup.py or setup.cfg metadata are missing a project name setuptools will check to ensure that project name is added before a build?
Many thanks helping me sort this all out!
Hi @lwasser, I would like to cover a few aspects regarding your comment. I am not sure if I will manage to provide a cohesive answer but please find bellow my attempt:
-
By default
setuptoolswill include in the sdist (if exist):- pyproject.toml, setup.cfg, setup.py
- README, README{.rst,.md,.txt}
- tests/test*.py and test/test*.py
- the Python files that are part of the distribution
- files pointed by package_data, data_files
- all C sources listed as part of extensions or C libraries in the setup script (it does not include C headers)
- metadata files generated by setuptools (e.g.
PKG-INFO,entry-points.txt)
You can verify that in practice by running a small example:
> docker run --rm -it python:3.10 /bin/bash mkdir -p /tmp/myproj cd /tmp/myproj mkdir -p src/mymod/ mkdir docs mkdir tests mkdir -p .github/workflows touch src/mymod/__init__.py touch docs/index.rst touch docs/index.html touch tests/test_mymod.py touch .github/workflows/main.yaml cat <<EOF > pyproject.toml [build-system] requires = ["setuptools"] build-backend = "setuptools.build_meta" [project] name = "myproj" version = "0.42" EOF python -m venv .venv .venv/bin/python -m pip install -U build .venv/bin/python -m build tar tf dist/*.tar.gz # myproj-0.42/ # myproj-0.42/PKG-INFO # myproj-0.42/pyproject.toml # myproj-0.42/setup.cfg # myproj-0.42/src/ # myproj-0.42/src/mymod/ # myproj-0.42/src/mymod/__init__.py # myproj-0.42/src/myproj.egg-info/ # myproj-0.42/src/myproj.egg-info/PKG-INFO # myproj-0.42/src/myproj.egg-info/SOURCES.txt # myproj-0.42/src/myproj.egg-info/dependency_links.txt # myproj-0.42/src/myproj.egg-info/top_level.txt # myproj-0.42/tests/ # myproj-0.42/tests/test_mymod.py -
As a remark I would say that in general adding both
docsandteststo thesdistis considered good practice (there is some disagreement, but my overall impression is that it is a 60%/40% split or further appart). You can see a discussion about this topic in https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578. -
I imagine that the reason why you see
docsin your project is because you are usingsetuptools-scm.setuptools-scmwill tell setuptools to add all files tracked by the VCS into the sdist. I believe that you should not be seen any "generated" .html file (unless you are adding those to your git repo for tracking, or forgot to configure.gitignore)Indeed, I personally recommend people to go for that solution, because I believe it:
a. is easier b. will include by default
docsandtests(which is kind of considered best practice) c. will automatically include any script and configuration file for tools used during development d. will automatically include examples e. will include everything that is needed for a developer to work with your project (effectivelly, yoursdistwill work as a snapshot of your project with added "Python package metadata").Some people don't like to see CI files in the
sdist(which is a fair opinion). However, I personally don't mind those and I actually think they are useful (a developer might inspect your.github/workflows/*.ymlto understand how to run the test suite). If don't like certain files, you can trim out excesses withMANIFEST.in. The same way, I believe most of the backends also have adhoc solutions for this kind of customization.
The last point is a bit of personal opinion:
- Isn't the concern about including a few text (e.g.
docs/*.rst) in the sdist a bit of premature optimization? I believe that the main problem that PyPI has is due to large binary artefacts (e.g. compiled native libraries) (specially if you need to produce multiple wheels, e.g. per-OS, per-architecture, ...).
I think the name issue relates to setuptools doesn't check for a project name if you are using a setup.py or .cfg file to store metadata. it does check the name in the pyproject.toml file. . this was a bug that someone told me about but because i don't use setup.py i don't have a project to test this. are you saying that is setup.py or setup.cfg metadata are missing a project name setuptools will check to ensure that project name is added before a build?
Setuptools can automatically derive a project name if you don't specify one (if you are using pyproject.toml without [project], setup.cfg or setup.py).
For example:
> docker run --rm -it python:3.10 /bin/bash
mkdir -p /tmp/myproj
cd /tmp/myproj
mkdir -p src/mymod/
touch src/mymod/__init__.py
cat <<EOF > pyproject.toml
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"
EOF
python -m venv .venv
.venv/bin/python -m pip install -U build
.venv/bin/python -m build
ls dist/*.whl
# dist/mymod-0.0.0-py3-none-any.whl
or
> docker run --rm -it python:3.10 /bin/bash
mkdir -p /tmp/myproj
cd /tmp/myproj
mkdir -p src/mymod/
touch src/mymod/__init__.py
cat <<EOF > setup.py
from setuptools import setup
setup()
EOF
python -m venv .venv
.venv/bin/python -m pip install -U build
.venv/bin/python -m build
ls dist/*.whl
# dist/mymod-0.0.0-py3-none-any.whl
You can see in these examples that setuptools automatically derives the name mymod from the files in your project (it will also derive a "degenerate version": 0.0.0).
There are some discussions that seem to associate the ability of setuptools to build projects with incomplete metadata with user confusion and problems. In my experience/opinion, there is no direct association. Instead, there are a few cases in which, somehow, the wrong version of setuptools ends up being used. Old versions of setuptools will not be able to read the information present in setup.cfg or pyproject.toml (e.g. leaks in the virtual environment created by the frontend, wrong version of setuptools specified in pyproject.toml, lack of pyproject.toml that causes any system-wide installation of setuptools to be used, etc...).
My personal opinion is that, if the user has a setup.py/pyproject.toml in a directory and they decide to activelly run python -m build, they do want to build a Python package... So there is no reason for setuptools to get in the way, intead it should try to assist the user on the best way possible (e.g. by deriving the project name automatically).
You can see a few discussions on the topic in the following links:
- https://github.com/pypa/setuptools/issues/3765#issuecomment-1380752877
- https://github.com/pypa/setuptools/issues/3511#issuecomment-1402084244
Hi @lwasser did you have any chance to have a look on the topics I discussed above?
hey @abravalheri thank you for following up. let me test again. i was still having issues with setuptools adding too many files by default but i may have done something wrong. so rather than sending you on a loop ... let me please test this out again this week.thank you so much for following up!
ok i've finally tested this (thank you for your patience) @abravalheri let's update our guide to ensure we have the behaviors around setuptools correct. Are you open to submitting a PR with the corrections by chance?
Many thanks!!
@all-contributors please add @abravalheri for code, design
@abravalheri i wondered if you could answer another question related to manifest.in file and setuptools asked here in our discourse. there has been some discussion around what to do with data files in a distribution that i suspect you could shed some light on. many thanks for considering this.