tests: test suite break when path contains `#`
Building from https://aur.archlinux.org/packages/python-cyclonedx-lib, I am getting ~400 errors if the path to the project contains a # character.
Log attached
error.log
Could you provide a reproducible setup/example?
dug into https://github.com/CycloneDX/cyclonedx-python-lib/files/14281372/error.log
line of interest:
======================================================================
ERROR: test_cases_render_valid_05_XML_1_5 (tests.test_enums.TestEnumComponentScope.test_cases_render_valid_05_XML_1_5)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/ddt.py", line 221, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/unittest/mock.py", line 1375, in patched
return func(*newargs, **newkeywargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/tests/test_enums.py", line 256, in test_cases_render_valid
super()._test_cases_render(bom, of, sv)
File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/tests/test_enums.py", line 145, in _test_cases_render
validation_errors = make_schemabased_validator(of, sv).validate_str(output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 61, in validate_str
return self._validata_data(
^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 67, in _validata_data
validator = self._validator # may throw on error that MUST NOT be caught
^^^^^^^^^^^^^^^
File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 91, in _validator
self.__validator = XMLSchema(file=schema_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/lxml/xmlschema.pxi", line 89, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: element decl. '{http://cyclonedx.org/schema/bom/1.5}id', attribute 'type': The QName value '{http://cyclonedx.org/schema/spdx}licenseId' does not resolve to a(n) type definition., line 644
It appears, that XMLSchema()'s file argument must not include a #, as it would be stripped, since the path is treated as a url.
I need to check with the leveraged https://pypi.org/project/lxml/ project, maybe there is a fix, or workaround, or an existing report/solution/advice.
Thank you for your report, @hseg
Since this project is community-driven, and contributions are welcome, I would ask you: are you interested in providing a solution to the problem? If so, please let me know. Please see our contribution guideline.
On Wed, Feb 14, 2024 at 07:16:32AM -0800, Jan Kowalleck wrote:
Could you give a reproducible setup/example?
Reproducing the steps run by pacman:
cd "$(mktemp -td tmp#XXXXX)"
wget https://github.com/CycloneDX/cyclonedx-python-lib/archive/refs/tags/v6.4.1.tar.gz
tar xzf v6.4.1.tar.gz
cd cyclonedx-python-lib-6.4.1/
find tests -name 'invalid-metadata-timestamp-*.json' -exec rm -v '{}' ';'
find tests -name 'valid-signatures-*.json' -exec rm -v '{}' ';'
python -m build --wheel --no-isolation
python -m venv --clear --system-site-packages .venv
source .venv/bin/activate
pip install --force-reinstall --no-deps dist/*.whl
python -m unittest discover -v
deactivate
By contrast, if the # in the invocation to mktemp there is replaced with
., everything passes.
On Wed, Feb 14, 2024 at 07:29:47AM -0800, Jan Kowalleck wrote:
Thank you for your report, @hseg
Since this project is community-driven, and contributions are welcome, I would ask you: are you interested in providing a solution to the problem? If so, please let me know. Please see our contribution guideline.
I don't have much free time at the moment, if no one has fixed this by 2024-03-17 I'll take it on.
re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944075667 Thanks for providing the setup snippet. this should make it easy for others to work on this issue.
re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944079050 I flagged this issue as "help wanted" as in "this is free for contributors". Whoever wants to tackle it, just drop a note here to let the others know.
Tried tackling this, one complication to look out for is that lxml makes heavy use of cython, which I'm having trouble debugging. I'm giving up on this one.
One thing I have noted playing around with this is that what is significant is the real path to the test files - in particular, a usable workaround is to have a clone somewhere with a saner path, and to have a symlink to it in the desired location.
El 14 de febrero de 2024 17:42:12 GMT+02:00, Jan Kowalleck @.***> escribió:
re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944075667 Thanks for providing the setup snippet. this should make it easy for others to work on this issue.
re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944079050 I flagged this issue as "help wanted" as in "this is free for contributors". Whoever wants to tackle it, just drop a note here to let the others know.
-- Reply to this email directly or view it on GitHub: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944090882 You are receiving this because you were mentioned.
Message ID: @.***>
I've run a couple of exhaustive tests trying to find the forbidden patterns.
Data is attached below. From it, I conjencture that besides #, it is also forbidden to have paths containing %xx where x are hexadecimal characters (though I've only checked lowercase...), excepting the following accepted hexadecimal sequences:
-
%01..%19 -
%20,%22,%23,%25,%60 -
%80..%99These can't be referring to control characters, otherwise eg%0a..%0f,%1a..1fwould've been accepted. However, I do note that the second sequence of accepted characters corresponds to -
-
" -
# -
% -
`respectively. Perhaps someone can make better use of this data than me.
looks like these chars are all XML special chars. makes sense, since the XML tooling crashes when it tries to resolve paths ...
Right. ig the weird part about this is that the XML tooling is embedding raw paths into the files it constructs. I would expect any such serialization/deserialization to include a quote/unquote step as well.
El 26 de mayo de 2024 16:44:33 GMT+03:00, Jan Kowalleck @.***> escribió:
looks like these chars are all XML special chars. makes sense, since the XML tooling crashes when it tries to resolve paths ...
-- Reply to this email directly or view it on GitHub: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-2132229479 You are receiving this because you were mentioned.
Message ID: @.***>