rust-analyzer Implement server-side completion sorting and filtering

At the moment, we rely entirely on client-side sorting of completions. That's bad, and we should implement server side sorting. The big steps here are:

Design the API for composable completion item ordering
Implement the base version of this API
Hack LSP into accepting serve-provided completion order
Polish specific completion routines until the sort order is acceptable

Design

Each completion item should have a CompletionScore object. Scores are partially comparable. Score should record, precisely why we think this variant may be better than average and should return facts like "do the types align exactly?", "are we looking for a non-static function here", etc. I think it should be a bunch of bools. It currently is

pub enum CompletionScore { TypeMatch, TypeAndNameMatch }

I don't think that's the right repr.

After scores, we should implement fuzzy matching. We should look at the identifier being completed, and comput, for each CompletionItem, how good the match is, throwing away the completions which are not relevant.

Then, we should sort the final list, taking into account both the fuzzy score and CompletionScore (does it need a better name? MatchBits?)

Implementation

Fuzzy scoring needs to be implemented from scratch I think. Scores are partially there

LSP Hacking

LSP actually insists that the client does sorting&filtering, so we need to hack around that, see https://github.com/microsoft/language-server-protocol/issues/898 for details. I believe we need to:

set isIncomplete to true
set sort text to format!("{04}", i) where i is our final order

Polishing

We produce a ton of garbage completions at the moment, so we should take care to actually design and use the appropriate scores.

Mar 09 '21 13:03 matklad

cc @SomeoneToIgnore

Mar 09 '21 13:03 matklad

I was thinking on making a few benchmarks and maybe a few more path-handling improvements before even starting to think about this myself.

Should we also close https://github.com/rust-analyzer/rust-analyzer/issues/4544 ?

Mar 09 '21 14:03 SomeoneToIgnore

#7945 refactors some code, renaming CompletionScore to Relevance

Mar 09 '21 17:03 matklad

Thanks for writing this up! I'll give this a shot :+1:

It seems #7945 doesn't change that we don't sort type+name matches above just type matches. In the short term, is there a downside to taking the approach I proposed in #7904, which is to set the sortText for all items to some value [1, 3]?

I think long term we plan to be setting the value for sortText for all items anyway. So I think my question is partly, can this work be rolled out to users in stages, where stage 1 is the sorting and stage 2 is the filtering.

In stage 1 we could be building out lots of Relevance criteria, and it would be nice if during stage 1 we were setting sortText in such a way that users could see the improved sorting right away.

Then stage 2 would be filtering, where we'd do fuzzy matching and have to deal with the LSP Hacking you've described above (I'm still reading through the github issue linked there). Basically I can see the fuzzy matching and lsp hacking bringing on a whole other set of challenges, and it would be nice if we could avoid blocking the benefits of stage 1 behind the release of stage 2.

Does this seem feasible?

Mar 09 '21 20:03 JoshMcguigan

Yeah, I think that might make sense. I haven’t looked closely into that PR, but, if it works better in vs code than what we have today, I think we should merge.

On Tuesday, 9 March 2021, Josh Mcguigan [email protected] wrote:

Thanks for writing this up! I'll give this a shot 👍

It seems #7945 https://github.com/rust-analyzer/rust-analyzer/pull/7945 doesn't change that we don't sort type+name matches above just type matches. In the short term, is there a downside to taking the approach I proposed in #7904 https://github.com/rust-analyzer/rust-analyzer/pull/7904, which is to set the sortText for all items to some value [1, 3]?

I think long term we plan to be setting the value for sortText for all items anyway. So I think my question is partly, can this work be rolled out to users in stages, where stage 1 is the sorting and stage 2 is the filtering.

In stage 1 we could be building out lots of Relevance criteria, and it would be nice if during stage 1 we were setting sortText in such a way that users could see the improved sorting right away.

Then stage 2 would be filtering, where we'd do fuzzy matching and have to deal with the LSP Hacking you've described above (I'm still reading through the github issue linked there). Basically I can see the fuzzy matching and lsp hacking bringing on a whole other set of challenges, and it would be nice if we could avoid blocking the benefits of stage 1 behind the release of stage 2.

Does this seem feasible?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rust-analyzer/rust-analyzer/issues/7935#issuecomment-794429704, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANB3M43I3FAJYOPHFSXEKDTCZ3AXANCNFSM4Y3RWYNQ .

Mar 09 '21 21:03 matklad

I've updated #7904 (rebased since #7945 merged), which will fix the current sorting so type+name matches are above just type matches. I believe it will also serve as a reasonable base for continued work on building out improved relevance scoring.

Mar 12 '21 03:03 JoshMcguigan

See #7686 for an example of less than ideal sorting.

Mar 14 '21 20:03 JoshMcguigan

Each completion item should have a CompletionScore object

clangd has this: https://clangd.llvm.org/extensions.html#code-completion-scores

Mar 22 '21 07:03 fannheyward

Triage: We now have a CompletionRelevance struct, compute a score from that, and set the LSP sortText accordingly. Is there anything left to do or can this be closed?

Apr 11 '22 17:04 jonas-schievink

I think the crucial think we still don't do is the fuzzy matching of what the user has typed with the identifier we want to complete. That is, I belive current scorring completley ignores edit distance (double check me). Makes sense to open a separate issue for that maybe?

Apr 11 '22 17:04 matklad

https://github.com/rust-lang/rust-analyzer/pull/13124 shows an interesting use-case for this: sometimes (bindgen mostly) the amount of symbols we want to complete is humongous, which actually creates performance problems. So, ideally, we include a pre-filtering step: not only we want to sort & cull completions after we build the whole list, but, even while we are producing completions, we should keep only the 1000s best according to fuzzy matching (computing just the label for fuzzy matching is much cheaper than rendering the whole completion item).

Sep 18 '22 12:09 matklad

Closing this in favor of https://github.com/rust-lang/rust-analyzer/issues/14158 and https://github.com/rust-lang/rust-analyzer/issues/14159

Feb 15 '23 17:02 Veykril