Parse internal DTDs in doctype declaration
Up to now it was documented that internal DTDs inside the doctype declaration confuse HTML::Parser.
Depending on the content of that internal dtd, the parser would return a text token instead, but sometimes also a declaration token that contained a lot of elements and text appearing after the syntactically correct declaration as well.
The old implementation did allow for the empty internal DTD like:
<!DOCTYPE abc SYSTEM "abc.dtd" [] >
This patch allows non-empty internal DTDs inside those square brackets in the doctype declaration, and returns the whole internal DTD as one single token in the list, similar to the token just containing "[]" in the old implementation. E.g. now it correctly parses:
<!DOCTYPE abc SYSTEM abc.dtd"[
<!-- even a simple comment here would confuse it -->
<!-- or quoted strings with special chars like ]> -->
<!ENTITY confuse "]>">
] >
<abc>Hello world</abc>
Paul (Ten years after my previous small patch, but still using this very nice perl module, one of the only ones that allows for sane parsing of sgml-like files with errors in it.)
Wait a moment -- still some bug.
Ok. Now it correctly parses all the possible ways comments inside the internal DTD. Can you have a look now? Feedback welcome.