stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

[RFC]: add fuzzy auto-completion in REPL

Open Snehil-Shah opened this issue 1 year ago • 10 comments

Description

This RFC proposes adding fuzzy auto-completion extending the current strict auto-completion. This would allow us to forgive things like spelling mistakes and suggest more relevant completions.

Related Issues

Related issues https://github.com/stdlib-js/google-summer-of-code/issues/1

Questions

I played around a bit trying to write a fuzzy matching algorithm and using existing ones. Right now we are lexicographically sorting the completion results. should we do the same with fuzzy results mixed in? or should aim to sort results by 'relevancy'?

Other

No.

Checklist

  • [X] I have read and understood the Code of Conduct.
  • [X] Searched for existing issues and pull requests.
  • [X] The issue name begins with RFC:.

Snehil-Shah avatar Mar 12 '24 14:03 Snehil-Shah

@Snehil-Shah Thanks for opening this. I think if we are to add support for fuzzy auto-completion, we'd want to rank/list according to relevancy, similar to how fuzzy search might work in a search engine. Lexicographic makes sense currently, as completions are exact prefix matches and we don't have additional criteria for sorting otherwise.

Similar to a search engine displaying results, we probably want a means for visually highlighting what is matching (e.g., if I type ys, then we could mark the fuzzy match in bold yes).

kgryte avatar Mar 15 '24 00:03 kgryte

Recently came across https://github.com/JunoLab/FuzzyCompletions.jl/blob/master/src/FuzzyCompletions.jl which uses Levenshtein distance and a fudge factor for scoring suggestions.

kgryte avatar Mar 26 '24 23:03 kgryte

...and also a fuzzy completer implementation in Prompt Toolkit: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/src/prompt_toolkit/completion/fuzzy_completer.py

kgryte avatar Mar 27 '24 03:03 kgryte

@kgryte I don't think levenshtein distance is a good algorithm for code-completion as mentioned in OP of #1855 as a difference in length of the input and completion would also affect the score, which means it can score an exact prefix match worse than a not-so-exact match with a shorter length.

The prompt toolkit's algorithm is good and simple, but is not forgiving to spelling mistakes, as it requires each letter of the input string to exist in the completion string. I think we can modify this a bit using my own algorithm (#1855) to account for spelling mistakes as well

Snehil-Shah avatar Mar 27 '24 08:03 Snehil-Shah

@Snehil-Shah Yes, that's fair. Although you could probably combine Levenshtein with other approaches, such as favoring exact prefix matches.

kgryte avatar Mar 27 '24 08:03 kgryte

@Snehil-Shah Yes, that's fair. Although you could probably combine Levenshtein with other approaches, such as favoring exact prefix matches.

I think prompt toolkit's algorithm would be better than the levenshtein's. I edited the above reply in case you didn't read that..

Snehil-Shah avatar Mar 27 '24 08:03 Snehil-Shah

@Snehil-Shah That's fair. At this point, you are the expert!

kgryte avatar Mar 27 '24 08:03 kgryte

Another fuzzy auto-completion implementation: https://github.com/codemirror/autocomplete/blob/5ad2ebc861f2f61cdc943fc087a5bfb756a7d0fa/src/filter.ts

kgryte avatar Mar 28 '24 23:03 kgryte

Another fuzzy match algorithm: https://github.com/junegunn/fzf/blob/master/src/algo/algo.go

kgryte avatar Apr 05 '24 19:04 kgryte

Another fuzzy match algorithm: https://github.com/helix-editor/nucleo/tree/master/matcher/src

kgryte avatar Apr 05 '24 19:04 kgryte