cyclonedx-python-lib icon indicating copy to clipboard operation
cyclonedx-python-lib copied to clipboard

tests: test suite break when path contains `#`

Open hseg opened this issue 1 year ago • 10 comments

Building from https://aur.archlinux.org/packages/python-cyclonedx-lib, I am getting ~400 errors if the path to the project contains a # character. Log attached error.log

hseg avatar Feb 14 '24 15:02 hseg

Could you provide a reproducible setup/example?

jkowalleck avatar Feb 14 '24 15:02 jkowalleck

dug into https://github.com/CycloneDX/cyclonedx-python-lib/files/14281372/error.log

line of interest:

======================================================================
ERROR: test_cases_render_valid_05_XML_1_5 (tests.test_enums.TestEnumComponentScope.test_cases_render_valid_05_XML_1_5)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/ddt.py", line 221, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/unittest/mock.py", line 1375, in patched
    return func(*newargs, **newkeywargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/tests/test_enums.py", line 256, in test_cases_render_valid
    super()._test_cases_render(bom, of, sv)
  File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/tests/test_enums.py", line 145, in _test_cases_render
    validation_errors = make_schemabased_validator(of, sv).validate_str(output)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 61, in validate_str
    return self._validata_data(
           ^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 67, in _validata_data
    validator = self._validator  # may throw on error that MUST NOT be caught
                ^^^^^^^^^^^^^^^
  File "/tmp/tmp#Le1C31tGnA/python-cyclonedx-lib/src/cyclonedx-python-lib-6.4.1/cyclonedx/validation/xml.py", line 91, in _validator
    self.__validator = XMLSchema(file=schema_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/xmlschema.pxi", line 89, in lxml.etree.XMLSchema.__init__
lxml.etree.XMLSchemaParseError: element decl. '{http://cyclonedx.org/schema/bom/1.5}id', attribute 'type': The QName value '{http://cyclonedx.org/schema/spdx}licenseId' does not resolve to a(n) type definition., line 644

It appears, that XMLSchema()'s file argument must not include a #, as it would be stripped, since the path is treated as a url.

I need to check with the leveraged https://pypi.org/project/lxml/ project, maybe there is a fix, or workaround, or an existing report/solution/advice.

jkowalleck avatar Feb 14 '24 15:02 jkowalleck

Thank you for your report, @hseg

Since this project is community-driven, and contributions are welcome, I would ask you: are you interested in providing a solution to the problem? If so, please let me know. Please see our contribution guideline.

jkowalleck avatar Feb 14 '24 15:02 jkowalleck

On Wed, Feb 14, 2024 at 07:16:32AM -0800, Jan Kowalleck wrote:

Could you give a reproducible setup/example?

Reproducing the steps run by pacman:

cd "$(mktemp -td tmp#XXXXX)"
wget https://github.com/CycloneDX/cyclonedx-python-lib/archive/refs/tags/v6.4.1.tar.gz
tar xzf v6.4.1.tar.gz
cd cyclonedx-python-lib-6.4.1/
find tests -name 'invalid-metadata-timestamp-*.json' -exec rm -v '{}' ';'
find tests -name 'valid-signatures-*.json' -exec rm -v '{}' ';'
python -m build --wheel --no-isolation
python -m venv --clear --system-site-packages .venv
source .venv/bin/activate
pip install --force-reinstall --no-deps dist/*.whl
python -m unittest discover -v
deactivate

By contrast, if the # in the invocation to mktemp there is replaced with ., everything passes.

hseg avatar Feb 14 '24 15:02 hseg

On Wed, Feb 14, 2024 at 07:29:47AM -0800, Jan Kowalleck wrote:

Thank you for your report, @hseg

Since this project is community-driven, and contributions are welcome, I would ask you: are you interested in providing a solution to the problem? If so, please let me know. Please see our contribution guideline.

I don't have much free time at the moment, if no one has fixed this by 2024-03-17 I'll take it on.

hseg avatar Feb 14 '24 15:02 hseg

re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944075667 Thanks for providing the setup snippet. this should make it easy for others to work on this issue.

re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944079050 I flagged this issue as "help wanted" as in "this is free for contributors". Whoever wants to tackle it, just drop a note here to let the others know.

jkowalleck avatar Feb 14 '24 15:02 jkowalleck

Tried tackling this, one complication to look out for is that lxml makes heavy use of cython, which I'm having trouble debugging. I'm giving up on this one. One thing I have noted playing around with this is that what is significant is the real path to the test files - in particular, a usable workaround is to have a clone somewhere with a saner path, and to have a symlink to it in the desired location.

El 14 de febrero de 2024 17:42:12 GMT+02:00, Jan Kowalleck @.***> escribió:

re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944075667 Thanks for providing the setup snippet. this should make it easy for others to work on this issue.

re: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944079050 I flagged this issue as "help wanted" as in "this is free for contributors". Whoever wants to tackle it, just drop a note here to let the others know.

-- Reply to this email directly or view it on GitHub: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-1944090882 You are receiving this because you were mentioned.

Message ID: @.***>

hseg avatar Apr 30 '24 18:04 hseg

I've run a couple of exhaustive tests trying to find the forbidden patterns. Data is attached below. From it, I conjencture that besides #, it is also forbidden to have paths containing %xx where x are hexadecimal characters (though I've only checked lowercase...), excepting the following accepted hexadecimal sequences:

  • %01..%19
  • %20,%22,%23,%25,%60
  • %80..%99 These can't be referring to control characters, otherwise eg %0a..%0f, %1a..1f would've been accepted. However, I do note that the second sequence of accepted characters corresponds to
  • "
  • #
  • %
  • ` respectively. Perhaps someone can make better use of this data than me.

failures-sorted.log successes-sorted.log

hseg avatar May 26 '24 10:05 hseg

looks like these chars are all XML special chars. makes sense, since the XML tooling crashes when it tries to resolve paths ...

jkowalleck avatar May 26 '24 13:05 jkowalleck

Right. ig the weird part about this is that the XML tooling is embedding raw paths into the files it constructs. I would expect any such serialization/deserialization to include a quote/unquote step as well.

El 26 de mayo de 2024 16:44:33 GMT+03:00, Jan Kowalleck @.***> escribió:

looks like these chars are all XML special chars. makes sense, since the XML tooling crashes when it tries to resolve paths ...

-- Reply to this email directly or view it on GitHub: https://github.com/CycloneDX/cyclonedx-python-lib/issues/551#issuecomment-2132229479 You are receiving this because you were mentioned.

Message ID: @.***>

hseg avatar May 26 '24 13:05 hseg