[FEATURE] Better and more complete parsing for `requirements.txt` files
We currently utilise pkg_resources.parse_requirements to parse lines from requirements.txt files.
This seemingly has a number of short comings that it does not support. See:
- CycloneDX/cyclonedx-python#315
- CycloneDX/cyclonedx-python-lib#8
This feature will look at alternatives to the above method to attempt to support these other formats. CycloneDX/cyclonedx-python-lib#97 may have identified a candidate in requirements-parser - TBC.
we might need to see what pip install -r uses as a parser.
this is the parser most people know and the reason why people expect the same features here, too.
unfortunately the parser is internal: pip._internal.req.parse_requirements - see https://github.com/pypa/pip/blob/main/src/pip/_internal/req/init.py
see further: https://github.com/pypa/pip/blob/main/src/pip/_internal/req/req_file.py
The whole topic seams to be an issue, because people dont read properly and confuse our requirements.txt capabilities with the one they know from some project they use without knowing what they are actually doing.
We should have cyclonedx-python-lib's requirements-parser comply to PEP508 - as the readme tells - and that is it. everything else can be implemented by volunteer contributors, if they need additional features.
see https://github.com/CycloneDX/cyclonedx-python/discussions/319 for discussion.
Parser implementation live in cyclonedx-python now - will relocate this Issue to that project.
An observation is that there's also no output for requirements generated when the requirements.txt file is without version specification, as such:
% cat requirements.txt
graphviz
PyYAML
Jinja2
Cerberus
% cyclonedx-bom -r -i requirements.txt --force
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! Some of your dependencies do not have pinned version !!
!! numbers in your requirements.txt !!
!! !!
!! -> graphviz !!
!! -> PyYAML !!
!! -> Jinja2 !!
!! -> Cerberus !!
!! !!
!! The above will NOT be included in the generated !!
!! CycloneDX as version is a mandatory field. !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
% pip show cyclonedx-bom
Name: cyclonedx-bom
Version: 3.5.0
Summary: CycloneDX Software Bill of Materials (SBOM) generation utility
Home-page: https://github.com/CycloneDX/cyclonedx-python
Author: Steven Springett
Author-email: [email protected]
License: Apache-2.0
Location: /usr/local/lib/python3.9/site-packages
Requires: cyclonedx-python-lib, packageurl-python, pip-requirements-parser, setuptools, toml
Required-by:
One potential way of handling this could be through using pkg_resources package however, this requires that the dependencies are installed:
% python
Python 3.9.13 (main, Aug 7 2022, 01:32:00)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resources
>>> pkg = pkg_resources.require("graphviz")[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 909, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/local/lib/python3.9/site-packages/pkg_resources/__init__.py", line 795, in resolve
raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'graphviz' distribution was not found and is required by the application
Upon installing it is possible to get the version specification installed:
% pip install -r requirements.txt
…
% python
Python 3.9.13 (main, Aug 7 2022, 01:32:00)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkg_resource
>>> pkg = pkg_resources.require("graphviz")[0]
>>> try:
... lines = pkg.get_metadata_lines('METADATA')
... except FileNotFoundError:
... lines = pkg.get_metadata_lines('PKG-INFO')
...
>>> for line in lines:
... if line.startswith('License:'):
... print({' '.join(str(pkg).split(' ')[-2:]): line[9:]})
...
{'graphviz 0.20.1': 'MIT'}
It seems incorrect to not include packages that don't have a defined version specification (e.g. always going with 'latest'). What's your thoughts on this?
With latest changes in CDX 1.4 we could add these dependnecies to the bom. When rendering as CDX <= 1.3 the version should be an empty string.
I would consider this as a a breaking change, as it is a change in the output/result dramatically. Alternative: have a switch that allowed to add these non-versioned dependencies.
nevertheless, @CasperGN, to solve your use case, you might follow https://cyclonedx-bom-tool.readthedocs.io/en/latest/usage.html?highlight=freeze#requirements
to create a proper requirements.txt
@jkowalleck thanks for quick return!
It's my understanding - which may be incorrect - that PEP508 allows for requirements.txt to only supply the name of the dependency, that's at least what I gather from the abstract:
".. The job of a dependency is to enable tools like pip [1] to find the right package to install. Sometimes this is very loose - just specifying a name, and sometimes very specific - referring to a specific file to install...".
My goal is to cover all use cases for the SBOM generation and hence I'd need to be able to support all variations of dependencies and formats for describing these. I did test out the above change in requirements.py:
for requirement in parsed_rf.requirements:
name = requirement.link.url if requirement.is_local_path else requirement.name
version = requirement.get_pinned_version or self.no_version_handler(requirement=requirement, pkg_name=name)
...
def no_version_handler(self, pkg_name: str, requirement: InstallRequirement) -> str:
try:
pkg = require(pkg_name)[0]
except DistributionNotFound:
print('Running -r with no pinned versions require dependencies to be installed. Run: `pip install -r requirements.txt` before running cyclonedx-bom.')
exit(1)
try:
lines = pkg.get_metadata_lines('METADATA')
except FileNotFoundError:
lines = pkg.get_metadata_lines('PKG-INFO')
for line in lines:
if line.startswith('License:'):
return line[9:]
return str(requirement.dumps_specifier())
Which yields the below bom
<?xml version="1.0" encoding="UTF-8"?>
<bom xmlns="http://cyclonedx.org/schema/bom/1.4" version="1" serialNumber="urn:uuid:0375246a-04da-4488-8fb3-bb0d77c49247">
<metadata>
...
</metadata>
<components>
<component type="library" bom-ref="f9233fb0-7c5f-4b53-a13f-1a3a95802229">
<name>Cerberus</name>
<version>ISC</version>
<purl>pkg:pypi/cerberus@ISC</purl>
</component>
<component type="library" bom-ref="439355ab-955c-4850-9093-e33d50ba5521">
<name>Jinja2</name>
<version>BSD-3-Clause</version>
<purl>pkg:pypi/jinja2@BSD-3-Clause</purl>
</component>
<component type="library" bom-ref="0dc1d956-35ef-4c46-9184-ce53de30e1ae">
<name>PyYAML</name>
<version>MIT</version>
<purl>pkg:pypi/pyyaml@MIT</purl>
</component>
<component type="library" bom-ref="7ee47e91-cd0a-413b-b3f3-3fd1ae1f270c">
<name>graphviz</name>
<version>MIT</version>
<purl>pkg:pypi/graphviz@MIT</purl>
</component>
</components>
<dependencies>
<dependency ref="f9233fb0-7c5f-4b53-a13f-1a3a95802229" />
<dependency ref="439355ab-955c-4850-9093-e33d50ba5521" />
<dependency ref="0dc1d956-35ef-4c46-9184-ce53de30e1ae" />
<dependency ref="7ee47e91-cd0a-413b-b3f3-3fd1ae1f270c" />
</dependencies>
</bom>
Would this still be considered breaking changes?
v4 will be using the https://pypi.org/project/pip-requirements-parser/ which sould close this issue.
fixed by #605
This feature will be part of the next/upcoming major release.
Changelog: see https://github.com/CycloneDX/cyclonedx-python/pull/605
Install via: pip install cyclonedx-bom==4.0.0rc1