Link detection is not robust
Link detection currently uses some terrible regular expressions to find document links. This leads to links incorrectly being detected when they should not be, as well as us missing some potential links. The upside is that the regular expression are pretty fast
Ideally we should instead get the link data from the markdown AST. However markdown-it currently doesn't provide line,column data for link tokens (or even file offsets, unless I've missed something)
I previously explored using tree sitter for this but performance wasn't acceptable. Not sure if that's a fundamental issue or something we could fix. Having the position data from markdown-it though would be a better solution
Relevant issues:
- https://github.com/markdown-it/markdown-it/issues/821
- https://github.com/markdown-it/markdown-it/issues/675
- https://github.com/markdown-it/markdown-it/issues/125
Is it worth considering other parsers than markdown-it? Technically this project isn't coupled to markdown-it, although expecting tokens rather than an AST has practical effects of coupling it. mdast could be an option for getting link positions.
Could an IMdLinkComputer interface be exposed and allow providing an optional linkComputer to createLanguageService, with the default implementation being the existing regex-based one?
I have a use-case where I'd like to use this project for linting links in docs (via the diagnostics) so robustness would be nice - if IMdLinkComputer were exposed I could make an mdast implementation to avoid the pitfalls of the regexes. For my needs the performance hit of parsing the documents twice is not a big concern, although I realize using two different parsers is less likely to be a solution for VS Code.
Adding that this fails consistently for mdx files with front-matter that has inline arrays.
For example:
countries: ['CA']
will produce error: No link definition found: ''CA''(link.no-such-reference)
I tried turning off markdown.validate.enabled in project vscode settings and it still flags the issues even after reloading VSCode and then even after restarting it... (reference: https://code.visualstudio.com/updates/v1_72#_markdown-link-validation)
A successful resolution of this issue should include confirmation/tests for cases with front-matter (with square bracket arrays in particular) and MDX syntax.
UPDATE
Turns out I needed to disable the following as well in vscode settings... My project is using md + mdx.
"mdx.validate.validateReferences": "ignore",