vscode-markdown-languageservice icon indicating copy to clipboard operation
vscode-markdown-languageservice copied to clipboard

Link detection is not robust

Open mjbvz opened this issue 3 years ago • 2 comments

Link detection currently uses some terrible regular expressions to find document links. This leads to links incorrectly being detected when they should not be, as well as us missing some potential links. The upside is that the regular expression are pretty fast

Ideally we should instead get the link data from the markdown AST. However markdown-it currently doesn't provide line,column data for link tokens (or even file offsets, unless I've missed something)


I previously explored using tree sitter for this but performance wasn't acceptable. Not sure if that's a fundamental issue or something we could fix. Having the position data from markdown-it though would be a better solution

Relevant issues:

  • https://github.com/markdown-it/markdown-it/issues/821
  • https://github.com/markdown-it/markdown-it/issues/675
  • https://github.com/markdown-it/markdown-it/issues/125

mjbvz avatar Jul 22 '22 00:07 mjbvz

Is it worth considering other parsers than markdown-it? Technically this project isn't coupled to markdown-it, although expecting tokens rather than an AST has practical effects of coupling it. mdast could be an option for getting link positions.

Could an IMdLinkComputer interface be exposed and allow providing an optional linkComputer to createLanguageService, with the default implementation being the existing regex-based one?

I have a use-case where I'd like to use this project for linting links in docs (via the diagnostics) so robustness would be nice - if IMdLinkComputer were exposed I could make an mdast implementation to avoid the pitfalls of the regexes. For my needs the performance hit of parsing the documents twice is not a big concern, although I realize using two different parsers is less likely to be a solution for VS Code.

dsanders11 avatar Oct 20 '22 22:10 dsanders11

Adding that this fails consistently for mdx files with front-matter that has inline arrays.

For example:

countries: ['CA']

will produce error: No link definition found: ''CA''(link.no-such-reference)

I tried turning off markdown.validate.enabled in project vscode settings and it still flags the issues even after reloading VSCode and then even after restarting it... (reference: https://code.visualstudio.com/updates/v1_72#_markdown-link-validation)

A successful resolution of this issue should include confirmation/tests for cases with front-matter (with square bracket arrays in particular) and MDX syntax.

UPDATE

Turns out I needed to disable the following as well in vscode settings... My project is using md + mdx.

"mdx.validate.validateReferences": "ignore",

firxworx avatar Apr 14 '24 20:04 firxworx