Handling ":" in href attribute
I've received a new issue on python-textile (textile/python-textile#27) which caused me to wonder what the appropriate course of action is. Essentially what it boils down to is: how should ":" be handled within an href on a link. I usually use txstyle.org for guidance on how to handle special cases, but in this case it seems we diverge and that python-textile seems a little more sensible.
Consider the following:
- input:
this is some text to "a link":test:1234 - output from txstyle.org:
<p>this is some text to “a link”:test:1234</p> - output from python-textile:
<p>this is some text to <a href="test%3A1234">a link</a></p>
Handling the text as a link is definitely better, but what should be done about the colon character? Is it correct to percent-encode it or should it be left alone to let the browser interpret it? I feel it should be noted that this is an issue for one user who has a bunch of custom url schemes, which he links to in textile/python-textile#27. I'll work with him in the meantime to find a way around it, but it seemed this is an instance where some guidance from the spec would be helpful.
Thanks.
@ikirudennis, Hi Dennis, thanks for pointing this out
Looks like redcloth.org handles the colon the way you are suggesting as it gives <p>this is some text to <a href="test:1234">a link</a></p> as the output for your sample input.
I'll take a look at the php-textile handling of the path part of uri's. I know there is a restrictive (perhaps too restrictive) white-listing approach taken to allowed link schemes. This originated out of some link handling attacks against older versions of the php parser. Perhaps it's time to revisit this when not using restricted mode.