Readrops icon indicating copy to clipboard operation
Readrops copied to clipboard

Strict XHTML parsing ?

Open snarkturne opened this issue 3 years ago • 1 comments

Some gitlab atom feed can't be parsed. i found that the problem is that gitlab feeds contain <br> tags, instead of <br /> tags. These tags come from double space end of lines in markdown. I think gitlab should translate this to <br /> and not <br> but... As a quick "fix", would it be possible for readrops (don't know which parser is used) to accept <br> tags ?

Here is an example of a rejected atom feed :

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
<title>Test Atom Feed</title>
<link href="https://no" rel="self" type="application/atom+xml"/>
<link href="https://no" rel="alternate" type="text/html"/>
<id>TestFeed</id>
<updated>2022-07-22T06:57:01Z</updated>
<entry>
  <id>tag:gitlab.com,2022-07-20:2009239869</id>
  <link href="no"/>
  <title>Sample Title</title>
  <updated>2022-07-20T13:29:28Z</updated>
  <author>
    <name>My Name</name>
  </author>
  <summary type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">
  <p>Test<br></p>
  </div>
  </summary>
</entry>
</feed>

snarkturne avatar Jul 26 '22 13:07 snarkturne

Yeah, that's annoying. Clearly a problem from Gitlab side. All kind of html in a xml tag should be escaped or included in a CDATA. The lib I use, Konsume-xml, is able to handle regular html in a xml tag, but no malformed content.

I'll see what I can do.

Shinokuni avatar Aug 14 '24 15:08 Shinokuni