Organize Imports Diagnostic and Code Action

This PR adds a diagnostic and a code action to organize imports in the current Zig file. The code action is available in the command palette as organize @import.

It is designed to be fairly customizable, so that we can arrive at a acceptable sorting style, like different grouping or case sensitivity. All decisions, especially those around diagnostics are very much open to critique as I do not have experience with the LSP.

Closes #1637, continues work from #881.

Design

An import is a top level declaration of a variable that uses the @import builtin. A top level declaration that uses import variables (like const print = std.debug.print;) is not considered an import and stays in place.

The sorting splits imports into three separated groups: builtin packages, packages and project files. The builtin packages are sorted in the order: std, builtin, build_options. The sorting inside other groups is done by comparing the declaration names (the x in const x = @import("file.zig");).

The feature is tested both in unit tests and in neovim. The unit tests required a new helper function.

Diagnostic

The diagnostic is enabled under the warn_style zls config option, which is disabled by default. The diagnostic's range is the entire line of the import which would be moved by organizing imports. That means one diagnostic per wrongly placed import line.

The diagnostic does not consider the correctness of separators between imports.

Code Action

The code action is available based on the diagnostic. This means that users with the warn_style option disabled will not be offered the code action.

The code action always sorts. The previous PR had optimization using isSorted to avoid sorting when it was not necessary. This, however, did not work well in some corner cases.

The documentation comments (///) are moved with the import they are attached to. It also respects Top-Level Doc Comments (//!), which must appear before any expression. This is handled using the commments token location.

This feature required running generateCodeAction for each diagnostic provided in request. Previously, it ran the function only on diagnostics from getAstCheckDiagnostics.

Considerations and notes

Top-level declarations (that includes imports) are order-independent (Source and tested)
zig fmt and textDocument/formatting does not interfere with the sorting.
Only root declarations are considered for sorting (no import inside functions).
Will not work with @import(variable); (we don't do comptime)
Resolving one diagnostic will resolve all diagnostics in the file (since all imports are moved).
I removed the part dealing with intersection with action's range (builder.range field which no longer exists). If it was relevant, I can study it further.
The code action works without warn_style as seen in tests. It is just that user cannot call it without the diagnostic.
std.sort.sort does not exist anymore. I picked std.sort.heap as a replacement. My understanding is that we do not need a stable sort here, but feel free to correct me and suggest a different sort from the std library.

Unresolved questions

[ ] Allow for const print = std.debug.print; to be sorted as well in some way?
[ ] How to treat @import("root")? (Currently a package, which does not seem right)
[ ] The getSortSlice function ported from previous PR seems weird. I would just do return self.name and not look in the import path. Any reasoning behind this?
[ ] What about grouping by directory for the file imports? Perhaps if there are > 1 file imports from the same directory, they could be in a group of their own.
[ ] I think DiagnosticKind could be parsed from the diagnostic code, which would be cleaner and would easily allow more versions of the diagnostic message. Or use the data field of the diagnostic message.
[ ] What about @embedFile?
[ ] How to implement ignoring the diagnostic? Ast.zig, for example, imports extra stuff near end of the file for tests.

Oct 09 '24 19:10 Sekky61

I'm not sure about this being behind warn_style, the style guide makes no mention of the order or location of imports.

A problem for having a "blessed" location in the file for imports, is a few projects (including zig itself) are actually moving to have all imports at the bottom of files. Due to feeling like imports at the top are usually just noise to be scrolled past to get to the meat of the file.

Also I can't get the code action to apply in vscode, neither as a quick fix nor in the command palette.

Oct 09 '24 20:10 leecannon

Interesting stuff! I agree, I just wanted the diagnostic to not be on by default (people have preferences and it could be annoying). So two solutions that come to mind are 1) creating an entirely new config flag or 2) scrapping the diagnostic and having the code action just available whenever.

I will take a look at the problem you mention. It is supposed to appear when having the cursor on the diagnostic.

Oct 09 '24 20:10 Sekky61

Thank you for taking the time to revive #881.The detailed an thorough description of this PR is very helpful. I will try to answer/address some of the points that you have mentioned.

How to treat @import("root")? (Currently a package, which does not seem right)

There are three "builtin packages":

std
builtin
root

build_options is not one of them since it added through the build.zig. I could understand adding it to the other "builtin packages" since its a very common convention.

std.sort.sort does not exist anymore.

This function has been moved to std.mem.sort.

Will not work with @import(variable); (we don't do comptime)

Zig requires that the first parameter of @import is a string literal so this is not an issue.

The getSortSlice function ported from previous PR seems weird. I would just do return self.name and not look in the import path. Any reasoning behind this?

IIRC, this function was added to make it easier to control which string/name is used to do sorting. #881 sorted imported files like const foo = @import("bar/baz.zig") by the filename (baz.zig) instead of foo. It could become useful if we decode to change the logic later on.

I removed the part dealing with intersection with action's range (builder.range field which no longer exists). If it was relevant, I can study it further.

This was added as an optimization. The given range tells us for which part of the document the code actions should be computed. If the imports are not visible then there is no need to compute the code action. It not necessary to bring this back to get this PR merged.

The documentation comments (///) are moved with the import they are attached to.

What about non doc-comments (//)? One TODO task I had in #881 was "preserve comments". This was especially tricky for non doc-comments. The currently implementation moves them all after the imports. I could foresee a situation where the users want to keep them between the imports for some reason. I am not sure what the best approach would be. We can leave this as a followup task.

Also I can't get the code action to apply in vscode, neither as a quick fix nor in the command palette.

I have the same problem. Using the "Source Action..." command in VS Code is the only way I was able to run the code actions.

I would also prefer to not have this feature rely on warn_style. I plan on removing this option at some point to replace it with a linter. It is possible to have code actions without reporting a diagnostic. With #1093, editors that support it can run source.organizeImports code actions automatically on save.

Oct 12 '24 18:10 Techatrix

Allow for const print = std.debug.print; to be sorted as well in some way?

There are two possibles I could imagine:

Move alias to the end of each section

const std = @import("std");
const Ast = std.zig.Ast;
const Server = std.zig.Server;

const lsp = @import("lsp");
const AnyTransport = lsp.AnyTransport;
const types = lsp.types;

const DocumentScope = @import("../DocumentScope.zig");
const Declaration = DocumentScope.Declaration;
const Scope = DocumentScope.Scope;

Move alias after all imports

const std = @import("std");

const lsp = @import("lsp");

const DocumentScope = @import("../DocumentScope.zig");

// Not sure if we should sort by variable name or "path". I think path is more reasonable
const Ast = std.zig.Ast;
const Server = std.zig.Server;
const AnyTransport = lsp.AnyTransport;
const types = lsp.types;
const Declaration = DocumentScope.Declaration;
const Scope = DocumentScope.Scope;

I would personally prefer the first option but I am open to suggestions.

What about @embedFile?

Maybe move them into their own section after all @import("file.zig") imports.

Oct 12 '24 19:10 Techatrix

I am happy to say that I made progress on the VS Code situation. First of all, the import organizing does work with editor.codeActionsOnSave setting:

https://github.com/user-attachments/assets/a11379e6-0e01-42d6-84c4-90cd90cd40c7

But as you pointed out, the Organize Imports does not show up in the Command pallet and the shortcut does not work either. So I investigated with other LSPs. Rust-analyzer did not work either. But Typescript LSP did.

After looking around the code, I found the difference in registering codeActionProvider. This is the change:

--- a/src/Server.zig
+++ b/src/Server.zig
@@ -615,7 +615,9 @@ fn initializeHandler(server: *Server, arena: std.mem.Allocator, request: types.I
             },
             .documentHighlightProvider = .{ .bool = true },
             .hoverProvider = .{ .bool = true },
-            .codeActionProvider = .{ .bool = true },
+            .codeActionProvider = .{ .CodeActionOptions = .{ .codeActionKinds = &.{
+                .@"source.organizeImports",
+            } } },
             .declarationProvider = .{ .bool = true },
             .definitionProvider = .{ .bool = true },
             .typeDefinitionProvider = .{ .bool = true },

And it's there.

20241017_18h01m53s_grim

I suppose it has to be explicitly listed as supported. So how do we implement this? Simply create an array of all the code actions we support?

Oct 17 '24 16:10 Sekky61

I suppose it has to be explicitly listed as supported. So how do we implement this? Simply create an array of all the code actions we support?

I don't think there is any alternative to just hard-coding the supported code action kinds. I also noticed that Sublime Text LSP requires source.fixAll to be explicitly listed so that their lsp_code_actions_on_save setting works.

Oct 19 '24 17:10 Techatrix

Moving aliases to the right group has been implemented. I went for grouping based on the parent import:

const abc = @import("abc.zig");
const abc_related = abc.related;
const abc_related2 = abc.related2;
const xyz = @import("xyz.zig");
const xyz_related = xyz.related;

I am interested in what you think could be improved

Oct 22 '24 19:10 Sekky61

Thanks for collaborating and taking the time to teach me more about Zig!

The followup tasks would need some benchmarking (never optimize blindly). Do you have a recommended microbenchmark setup for Zig?

Oct 26 '24 13:10 Sekky61

The followup tasks would need some benchmarking (never optimize blindly). Do you have a recommended microbenchmark setup for Zig?

This post on ziggit.dev may be useful to you. My usually workflow involves perf, hotspot and poop. I created a separate branch at techatrix/code-action-organize-imports-bench which features a benchmark and some optimizations. I chose to create a standalone executable instead of opting for some microbenchmarking library. It creates a synthetic zig file with various configuration option. I chose to test on files with thousands of imports to stress test the implementation and to make any startup time of the benchmark negligible in comparison.

If you wish to use this as an opportunity to learn more about optimizing code then you can just use my benchmarking program and try to optimize it yourself. I would still suggest to use f27d726035d1084b9cc8ec32f659ed534dd45ed1 because it just performs a more efficient version of calling offsets.locToRange in a loop on the same document. The changes to getImportsDecls are featured in the next commit.

Here is a showcase of the optimized version performed poop:

Show Results

poop ./zig-out/bin/bench2-safe-old ./zig-out/bin/bench2-safe-new
Benchmark 1 (3 runs): ./zig-out/bin/bench2-safe-old
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          17.1s  ±  130ms    17.0s  … 17.3s           0 ( 0%)        0%
  peak_rss           50.1MB ± 56.8KB    50.1MB … 50.2MB          0 ( 0%)        0%
  cpu_cycles         65.4G  ±  241M     65.3G  … 65.7G           0 ( 0%)        0%
  instructions        109G  ± 42.5K      109G  …  109G           0 ( 0%)        0%
  cache_references   70.9M  ± 2.14M     69.1M  … 73.3M           0 ( 0%)        0%
  cache_misses        389K  ±  116K      257K  …  477K           0 ( 0%)        0%
  branch_misses       129M  ± 38.3K      129M  …  129M           0 ( 0%)        0%
Benchmark 2 (4 runs): ./zig-out/bin/bench2-safe-new
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.31s  ± 2.75ms    1.31s  … 1.32s           0 ( 0%)        ⚡- 92.3% ±  0.9%
  peak_rss            108MB ± 74.6KB     108MB …  108MB          0 ( 0%)        💩+115.2% ±  0.3%
  cpu_cycles         4.83G  ± 8.43M     4.82G  … 4.84G           0 ( 0%)        ⚡- 92.6% ±  0.5%
  instructions       16.4G  ±  105K     16.4G  … 16.4G           0 ( 0%)        ⚡- 84.9% ±  0.0%
  cache_references   16.4M  ± 3.69M     13.9M  … 21.8M           0 ( 0%)        ⚡- 76.9% ±  8.7%
  cache_misses        143K  ± 47.9K      110K  …  213K           0 ( 0%)        ⚡- 63.2% ± 41.5%
  branch_misses      4.45M  ± 14.7K     4.43M  … 4.47M           0 ( 0%)        ⚡- 96.6% ±  0.0%

The speedup is less extreme on smaller (more realistic) files.

Oct 26 '24 20:10 Techatrix

Great success!

Bench - reproduced results

❯  poop zig-out/bin/bench-orig zig-out/bin/bench-techatrix
Benchmark 1 (3 runs): zig-out/bin/bench-orig
measurement          mean ± σ            min … max           outliers         delta
wall_time          95.0s  ± 2.43s     93.4s  … 97.8s           0 ( 0%)        0%
peak_rss           53.1MB ±  135KB    53.0MB … 53.2MB          0 ( 0%)        0%
cpu_cycles          425G  ± 1.60G      424G  …  427G           0 ( 0%)        0%
instructions       1.01T  ±  246K     1.01T  … 1.01T           0 ( 0%)        0%
cache_references   1.58G  ± 35.9M     1.55G  … 1.62G           0 ( 0%)        0%
cache_misses       43.3M  ± 5.08M     38.1M  … 48.2M           0 ( 0%)        0%
branch_misses       141M  ± 42.0K      141M  …  141M           0 ( 0%)        0%
Benchmark 2 (3 runs): zig-out/bin/bench-techatrix
measurement          mean ± σ            min … max           outliers         delta
wall_time          43.2s  ±  189ms    43.0s  … 43.4s           0 ( 0%)        ⚡- 54.5% ±  4.1%
peak_rss            111MB ±  154KB     111MB …  111MB          0 ( 0%)        💩+108.8% ±  0.6%
cpu_cycles          192G  ± 1.01G      191G  …  193G           0 ( 0%)        ⚡- 54.7% ±  0.7%
instructions        478G  ±  228K      478G  …  478G           0 ( 0%)        ⚡- 52.7% ±  0.0%
cache_references    540M  ± 36.6M      514M  …  582M           0 ( 0%)        ⚡- 65.8% ±  5.2%
cache_misses       15.7M  ± 1.01M     14.7M  … 16.7M           0 ( 0%)        ⚡- 63.8% ± 19.1%
branch_misses      11.5M  ± 68.8K     11.5M  … 11.6M           0 ( 0%)        ⚡- 91.8% ±  0.1%

I noticed from the profiler that the lookupSymbolGlobal was taking >90% of total time. I tried to cache known decls by string name, it got faster by about 30%, and +10% memory consumption.

Then I circumvented the lookupSymbolGlobal entirely by getting the root scope manually. This should be okay, since we only care about decls in root scope. This yielded another 94% improvement for a total improvement of 97%!

Bench - direct root scope lookup (improvement over the first optimization)

Benchmark 1 (3 runs): bench-techatrix
measurement          mean ± σ            min … max           outliers         delta
wall_time          42.7s  ±  604ms    42.1s  … 43.3s           0 ( 0%)        0%
peak_rss            111MB ± 18.8KB     111MB …  111MB          0 ( 0%)        0%
cpu_cycles          190G  ± 1.55G      189G  …  191G           0 ( 0%)        0%
instructions        478G  ±  855K      478G  …  478G           0 ( 0%)        0%
cache_references    552M  ± 29.6M      530M  …  585M           0 ( 0%)        0%
cache_misses       19.6M  ±  947K     18.5M  … 20.4M           0 ( 0%)        0%
branch_misses      11.6M  ± 48.9K     11.5M  … 11.6M           0 ( 0%)        0%
Benchmark 2 (3 runs): bench-root-scope-no-strings
measurement          mean ± σ            min … max           outliers         delta
wall_time          2.56s  ± 23.3ms    2.54s  … 2.59s           0 ( 0%)        ⚡- 94.0% ±  2.3%
peak_rss            111MB ± 75.9KB     111MB …  111MB          0 ( 0%)          -  0.1% ±  0.1%
cpu_cycles         9.39G  ± 26.2M     9.37G  … 9.42G           0 ( 0%)        ⚡- 95.0% ±  1.3%
instructions       27.0G  ±  413K     27.0G  … 27.0G           0 ( 0%)        ⚡- 94.4% ±  0.0%
cache_references    107M  ±  571K      106M  …  107M           0 ( 0%)        ⚡- 80.6% ±  8.6%
cache_misses       5.33M  ± 64.1K     5.28M  … 5.40M           0 ( 0%)        ⚡- 72.7% ±  7.8%
branch_misses      10.6M  ± 33.6K     10.5M  … 10.6M           0 ( 0%)        ⚡-  8.8% ±  0.8%

Bench - total difference

Benchmark 1 (3 runs): bench-before
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          94.9s  ±  292ms    94.6s  … 95.2s           0 ( 0%)        0%
  peak_rss           53.2MB ±  119KB    53.1MB … 53.3MB          0 ( 0%)        0%
  cpu_cycles          427G  ± 1.03G      426G  …  428G           0 ( 0%)        0%
  instructions       1.01T  ±  639K     1.01T  … 1.01T           0 ( 0%)        0%
  cache_references   1.57G  ± 28.1M     1.54G  … 1.59G           0 ( 0%)        0%
  cache_misses       46.0M  ± 4.86M     42.0M  … 51.4M           0 ( 0%)        0%
  branch_misses       141M  ± 47.9K      141M  …  141M           0 ( 0%)        0%
Benchmark 2 (3 runs): bench-final
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.58s  ± 2.60ms    2.57s  … 2.58s           0 ( 0%)        ⚡- 97.3% ±  0.5%
  peak_rss            111MB ±  108KB     110MB …  111MB          0 ( 0%)        💩+107.9% ±  0.5%
  cpu_cycles         9.41G  ± 9.30M     9.40G  … 9.42G           0 ( 0%)        ⚡- 97.8% ±  0.4%
  instructions       27.0G  ±  162K     27.0G  … 27.0G           0 ( 0%)        ⚡- 97.3% ±  0.0%
  cache_references    107M  ± 1.19M      107M  …  109M           0 ( 0%)        ⚡- 93.2% ±  2.9%
  cache_misses       5.21M  ± 91.3K     5.12M  … 5.30M           0 ( 0%)        ⚡- 88.7% ± 16.9%
  branch_misses      10.5M  ± 11.1K     10.5M  … 10.5M           0 ( 0%)        ⚡- 92.5% ±  0.1%

At this point we are more or less IO bound.

Oct 27 '24 23:10 Sekky61

I should have mentioned that the configuration options in the benchmarking program are meant to be used with a Release Build (zig build bench -Doptimize=ReleaseSafe). Waiting a few minutes for benchmark results can be quite annoying.

Anyway, the removal of lookupSymbolGlobal made for a nice improvement!

Nov 03 '24 19:11 Techatrix

Code action to organize imports

Organize Imports Diagnostic and Code Action

Design

Diagnostic

Code Action

Considerations and notes

Unresolved questions

Move alias to the end of each section

Move alias after all imports