show_map.php should handle & in url
Some RSS handlers htmlescape special signs in the links, such as & -> &. If the urls are then accessed directly without converting back, doma can't find the map.
Example:
Link from doma rss feed: https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191
Link after the feed has been processed by w3 rss feed validator ( https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fkartarkiv.nydalen.idrett.no%2Frss.php ): https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191
The later link result in doma not finding the map, and thus returning The map has been removed. to the user.
Related code:
https://github.com/matstroeng/doma/blob/master/src/show_map.controller.php#L19
Suggested solution: Doma/php should handle & as url parameter.
Edit: Think this must be solved in code, since:
PHP's URL parser does not expect to encounter HTML entities, because they should not be present in URLs; it therefore correctly splits the query string on &, treating the trailing amp; as part of the key. (source: https://stackoverflow.com/questions/17972654/amp-precedes-get-array-element-parameter-name)
I think the root problem is that the rss-link uses the print statement - which HTML-encodes the url.
https://github.com/matstroeng/doma/blob/master/src/rss.php#L27
Html encoding content is a safety measure, but in this case, the Doma service has full control over the url and can output the raw value in the link element.
That is a good find @runerys , thanks. When I investigated this issue I just used the browser. It does of course decode the entities, thus I didn't notice this. When using curl to scrape the url the output is:
$ curl -v https://kartarkiv.nydalen.idrett.no/rss.php
…
<link>https://kartarkiv.nydalen.idrett.no/show_map.php?user=vbj&map=7191</link>
…
Will refactor the issue and PR to fix the root cause.
Yes - and "View source" in the browser reveals the same.
I have to admit that I struggle to find out if the RSS link-element MUST be encoded according to standards. I've searched around a bit, but all examples I find are to blogs with nice folder-like urls.
Unfortunately, I'm wrong. I found some validators, and the link element must be html encoded. The error is in the client application handling the feed and NOT html decoding the link before opening it.
So I guess a fix must look more like your original proposal.
You can try validating both urls and direct rss input here: https://validator.w3.org/feed/