ilbot icon indicating copy to clipboard operation
ilbot copied to clipboard

Links that end a parenthetical expression are broken

Open MasterDuke17 opened this issue 9 years ago • 3 comments

E.g., (this is some text and then a link, hxxp://foo.bar.com/baz) include the trailing ')' in the href. Real word example here http://irclog.perlgeek.de/perl6/2016-07-18#i_12863149

MasterDuke17 avatar Jul 18 '16 19:07 MasterDuke17

Unfortunately, here's a counter-example that will break if this is fixed: http://irclog.perlgeek.de/ilbot/2016-07-27#i_12919928

Extracting URLs is really a heuristic. For example a comma is a valid part of an URL, but typically trailing commas are part of the surrounding text.

Due to the prevalence of Wikipedia-URLs that end in a closing paren, I'm likely to reject this.

moritz avatar Jul 27 '16 17:07 moritz

What about if I work the regex so that a ')' isn't matched unless there's also a '(' after the 'https?://'?

On Wed, Jul 27, 2016 at 1:15 PM, Moritz Lenz [email protected] wrote:

Unfortunately, here's a counter-example that will break if this is fixed: http://irclog.perlgeek.de/ilbot/2016-07-27#i_12919928

Extracting URLs is really a heuristic. For example a comma is a valid part of an URL, but typically trailing commas are part of the surrounding text.

Due to the prevalence of Wikipedia-URLs that end in a closing paren, I'm likely to reject this.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/moritz/ilbot/issues/47#issuecomment-235654961, or mute the thread https://github.com/notifications/unsubscribe-auth/AJx8UQNgeSIbZNMxvTCeVnZxZqOW081bks5qZ5I7gaJpZM4JPEPF .

MasterDuke17 avatar Jul 27 '16 19:07 MasterDuke17

As long as it doesn't lead to exploding complexity, I'd accept that.

moritz avatar Jul 27 '16 20:07 moritz