linkify
linkify copied to clipboard
looseUrl option identifies text with multiple periods as a url
Issue:
Currently, the following patterns of text are being identified as url when looseUrl option is true when using linkify.
pattern1 -> 'awdaw....aw'
pattern2 -> 'awdaw...wad...wadw'
and so on...
Expected behaviour:
Technically, this shouldn't be identified as urls as there are multiple periods present consecutively and thus is an invalid url pattern.
I can track this issue to the looseUrlRegex and the issue's arising from including . at this point in regex which allows matching for multiple periods consecutively. Removing . from this section resolves the issue.
[-a-zA-Z0-9@:%._\+~#=]{2,256}
Complete looseUrlRegex
r'''^(.*?)((https?:\/\/)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//="'`]*))'''