html-parser icon indicating copy to clipboard operation
html-parser copied to clipboard

Possible bug parsing DOCTYPE of 4.01 Transitional HTML document

Open langdonr621 opened this issue 1 year ago • 0 comments

hi there,

i'm trying to use your library (0.7.0) but am facing a problem when parsing, for example, https://www.unicode.org/reports/tr29/#Word_Boundaries. the error thrown is similar to:

running 1 test
Failed :(  --> 1:23
 |
1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
 |                       ^---
 |
 = expected attribute key

here's a test function showing the problem:


    #[test]
    fn test_html_parser() {
        const HTML: &str = r#"<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html></html>"#;

        if let Err(x) = html_parser::Dom::parse(HTML) {
            println!("Failed :( {}", x)
        }
    }

this wikipedia page claims the directive's syntax is correct.

am i correct in assuming this is a bug with the grammar's rules used?

if not would appreciate it if you can indicate a work-around.

TIA + cheers;

langdonr621 avatar Feb 21 '24 03:02 langdonr621