readability icon indicating copy to clipboard operation
readability copied to clipboard

Extracting patterns of <br> within <p> yields html5ever warning

Open shouya opened this issue 1 year ago • 0 comments

Hi, I noticed some html snippets evoke html5ever warnings. After chasing down the cause I found a minimal pattern that reproduces the issue:

  let text = String::from("<p><br/>a<br/>a</p>");
  let mut text = std::io::Cursor::new(text);
  let product = readability::extractor::extract(&mut text, &url).unwrap();

and here's the warning message:

2024-06-19T14:08:40.634683Z  WARN html5ever::serialize: node with weird namespace Atom('' type=static)
2024-06-19T14:08:40.634723Z  WARN html5ever::serialize: node with weird namespace Atom('' type=static)

Note that if I remove the last a from the string (i.e. <p><br/>a<br/></p>), the warning is gone completely.

shouya avatar Jun 19 '24 14:06 shouya