Regression on parsing invalid URLs
As a continuation of #2377, we have a regression on parsing invalid URLs. Previously, the urllib was mach more liberal in processing URLs, now it rejects much more cases.
We use it for sanitize the URLs, and html_parser is an example of bot that uses the liberal behavior in tests:
https://github.com/certtools/intelmq/blob/61c45acfb8cc60e1419abe7c57691561ef9ee072/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py#L47
https://github.com/certtools/intelmq/blob/61c45acfb8cc60e1419abe7c57691561ef9ee072/intelmq/tests/bots/parsers/html_table/test_parser_column_split.py#L73-L80
In patched Python versions (e.g. 3.11.4), this URL is rejected. We need to either decide against allowing such URLs, or redesign our sanitization.
Temporally, the test is skipped to unlock other work.