Closing comments handling
I found a corner case, in which closing comments are not handled correctly. i.e.
<!--x>Comment<!-->
The end of the comment is marked by the second <!-->, but accidentally everything afterwards will be treated as comment also.
The following snippet demonstrates the problem (output is empty instead of "HELLO WORLD"):
html::dom page;
page.append_partial_html("<!--x>Comment<!--><html><head><title>HELLO WORLD</title></head><body></body></html>");
std::cout << page["title"].to_plain_text() << std::endl;
According to the HTML5 specification, parsing of the comment should happen as following:
Data state
< Markup declaration open state
-- Comment start state
x comment state
>Comment<! Append the current input character to the comment token's data
- Comment end dash state
- Comment end state
> Data state
Current implementation is in comment state (state = 12) while >Comment is getting parsed, but switches the state when the <! characters are encountered to state = 10.
case '<':
{
c = getc();
if (c == '!') {
pre_state = state;
state = 10;
} else {
content += '<';
content += c;
}
}
break;