Do autolinking during post composition, not during post display
The continuing proliferation of TLDs is causing a steady increase in complaints about unexpected links in text. For example, writing files.zip in a post now creates a link, because zip is now a TLD.
We could monkey around with the autolinker's regex in order to try to avoid such situations, but that's a perpetually losing game of catch-up. Moreover, even if we tried that, we could never get the right results for every situation because we simply can't read the user's mind. We need a different approach.
The Universal Acceptance Steering Group, "a community-based team of industry leaders supported by ICANN," published some recommendations regarding autolinking a few years ago. The most relevant part for our purposes is their recommendation that autolinking should happen during the input stage, not the output stage.
Currently, SMF does its autolinking during the output stage (specifically, while parsing BBC). This creates three closely related issues:
- Users will sometimes see unexpected links in their posts after submitting them, without any warning that this would happen during the authoring phase.
- Users have no easy means of removing unexpected links from their posts.
- New links can appear in old posts when new TLDs are added by ICANN.
All of these problems would be solved if SMF did its autolinking during the input stage. This would mean creating our own custom plugin for SCEditor that would detect plain text URLs and automatically wrap them in url BBC tags. The user would therefore see the link being created on the fly during the authoring phase, which would solve issue 1. If the user didn't want a particular string to be treated as a link, they would then be able to select that string and remove the link using the editor's toolbar button, which would solve issue 2. Moreover, because submitted URLs would now always be explicitly tagged using the url BBC, we could safely assume that untagged URLs in posts were not supposed to be linked at all, which would solve issue 3.
The one wrinkle in this plan would be the question of how to deal with plain text URLs in old posts that were written with the expectation of SMF's current autolinking behaviour. The possible solutions I see are:
- We record the date when the forum was upgraded to SMF 3.0, and then only perform autolinking during the output phase if the post was last modified before that date.
- We do in fact monkey around with the output-phase autolinker code to make it much more restrictive (e.g. only recognizing basic TLDs), so that our fallback when dealing with plain text URLs is very conservative in what it links.
- We do both (1) and (2).