doma icon indicating copy to clipboard operation
doma copied to clipboard

show_map.php should handle & in url

Open johnksv opened this issue 3 years ago • 4 comments

Some RSS handlers htmlescape special signs in the links, such as & -> &. If the urls are then accessed directly without converting back, doma can't find the map.

Example: Link from doma rss feed: https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191 Link after the feed has been processed by w3 rss feed validator ( https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fkartarkiv.nydalen.idrett.no%2Frss.php ): https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191

The later link result in doma not finding the map, and thus returning The map has been removed. to the user. Related code: https://github.com/matstroeng/doma/blob/master/src/show_map.controller.php#L19

Suggested solution: Doma/php should handle & as url parameter.

Edit: Think this must be solved in code, since:

PHP's URL parser does not expect to encounter HTML entities, because they should not be present in URLs; it therefore correctly splits the query string on &, treating the trailing amp; as part of the key. (source: https://stackoverflow.com/questions/17972654/amp-precedes-get-array-element-parameter-name)

johnksv avatar Mar 12 '23 13:03 johnksv

I think the root problem is that the rss-link uses the print statement - which HTML-encodes the url. https://github.com/matstroeng/doma/blob/master/src/rss.php#L27

Html encoding content is a safety measure, but in this case, the Doma service has full control over the url and can output the raw value in the link element.

runerys avatar Mar 21 '23 12:03 runerys

That is a good find @runerys , thanks. When I investigated this issue I just used the browser. It does of course decode the entities, thus I didn't notice this. When using curl to scrape the url the output is:

$ curl -v https://kartarkiv.nydalen.idrett.no/rss.php
…
<link>https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&amp;map=7191</link>
…

Will refactor the issue and PR to fix the root cause.

johnksv avatar Mar 21 '23 13:03 johnksv

Yes - and "View source" in the browser reveals the same.

I have to admit that I struggle to find out if the RSS link-element MUST be encoded according to standards. I've searched around a bit, but all examples I find are to blogs with nice folder-like urls.

runerys avatar Mar 21 '23 13:03 runerys

Unfortunately, I'm wrong. I found some validators, and the link element must be html encoded. The error is in the client application handling the feed and NOT html decoding the link before opening it.

So I guess a fix must look more like your original proposal.

You can try validating both urls and direct rss input here: https://validator.w3.org/feed/

runerys avatar Mar 21 '23 16:03 runerys