Use new Dictionary AlternateLookup to avoid allocation in DictionaryJumpTable
Use new Dictionary AlternateLookup to avoid allocation in DictionaryJumpTable
Description
Simple change which removes the substring allocation in DictionaryJumpTable using the GetAlternateLookup<ReadOnlySpan<char>> feature of .NET 9.
With this change all JumpTable implementations can accept ReadOnlySpan<char> as a parameter instead of string, PathSegment. I'd be happy to include that change if it's wanted.
Awesome, was looking forward to aspnetcore adopting the new alternate lookup feature.
Could you run the micro-benchmarks before and after this change? https://github.com/dotnet/aspnetcore/blob/main/src/Http/Routing/perf/Microbenchmarks/Matching/JumpTableMultipleEntryBenchmark.cs
Interestingly enough the PR looks to be slightly slower (5ns -> 11ns) with the benchmark as-is whereas I would have expected no difference.
However there's a flaw with the benchmark in that it currently always compares the entire string which is unusual and which negates the benefit proposed in this PR. In most real-world cases the PathSegment is only part of the string and so substrings are allocated.
I've modified the benchmark to account for this by using only part of the path string. The modified benchmark results are below. Overall less significant than I would have liked but it's faster and allocates less memory but only if we accept PathSegment usually points to a substring of Path. [edit: I confirmed that this is always the case since all paths start with /]
Before:
| Method | Count | Mean | Error | StdDev | Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | 2 | 0.6304 ns | 0.0052 ns | 0.0049 ns | 1,586,354,557.5 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 2 | 13.9512 ns | 0.1030 ns | 0.0860 ns | 71,678,284.5 | 22.13 | 0.29 | 0.0011 | - | - | 82 B |
| Baseline | 5 | 0.6195 ns | 0.0067 ns | 0.0060 ns | 1,614,331,173.5 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 5 | 14.5809 ns | 0.1890 ns | 0.1578 ns | 68,582,745.6 | 23.55 | 0.32 | 0.0011 | - | - | 82 B |
| Baseline | 10 | 0.6169 ns | 0.0111 ns | 0.0104 ns | 1,620,884,005.6 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 10 | 15.2295 ns | 0.2681 ns | 0.3486 ns | 65,661,888.5 | 24.69 | 0.86 | 0.0011 | - | - | 82 B |
| Baseline | 25 | 0.6165 ns | 0.0060 ns | 0.0056 ns | 1,621,971,184.8 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 25 | 16.9301 ns | 0.2896 ns | 0.2568 ns | 59,066,511.0 | 27.47 | 0.46 | 0.0011 | - | - | 82 B |
| Baseline | 50 | 0.6172 ns | 0.0060 ns | 0.0056 ns | 1,620,306,284.5 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 50 | 18.1510 ns | 0.0577 ns | 0.0482 ns | 55,093,297.9 | 29.42 | 0.34 | 0.0011 | - | - | 82 B |
| Baseline | 100 | 0.6176 ns | 0.0070 ns | 0.0066 ns | 1,619,194,748.0 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 100 | 20.4224 ns | 0.1325 ns | 0.1174 ns | 48,965,762.2 | 33.04 | 0.38 | 0.0011 | - | - | 82 B |
After:
| Method | Count | Mean | Error | StdDev | Op/s | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline | 2 | 0.6283 ns | 0.0088 ns | 0.0082 ns | 1,591,566,799.9 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 2 | 11.8230 ns | 0.1413 ns | 0.1321 ns | 84,580,838.3 | 18.82 | 0.22 | - | - | - | - |
| Baseline | 5 | 0.6149 ns | 0.0052 ns | 0.0046 ns | 1,626,151,413.8 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 5 | 11.8060 ns | 0.1567 ns | 0.1466 ns | 84,702,576.7 | 19.19 | 0.33 | - | - | - | - |
| Baseline | 10 | 0.6141 ns | 0.0067 ns | 0.0063 ns | 1,628,406,229.3 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 10 | 12.0957 ns | 0.1984 ns | 0.1856 ns | 82,673,853.8 | 19.70 | 0.39 | - | - | - | - |
| Baseline | 25 | 0.6163 ns | 0.0044 ns | 0.0039 ns | 1,622,602,556.1 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 25 | 13.0903 ns | 0.2482 ns | 0.2200 ns | 76,392,361.3 | 21.24 | 0.34 | - | - | - | - |
| Baseline | 50 | 0.6159 ns | 0.0095 ns | 0.0089 ns | 1,623,584,223.2 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 50 | 14.6091 ns | 0.0292 ns | 0.0273 ns | 68,450,281.6 | 23.72 | 0.35 | - | - | - | - |
| Baseline | 100 | 0.6143 ns | 0.0078 ns | 0.0073 ns | 1,627,848,658.0 | 1.00 | 0.00 | - | - | - | - |
| Dictionary | 100 | 16.1624 ns | 0.0569 ns | 0.0475 ns | 61,871,831.2 | 26.28 | 0.34 | - | - | - | - |
I've created https://github.com/dotnet/runtime/pull/108732 which if accepted/merged should further improve the performance of the AlternateLookup used in this PR.
/azp run
Was waiting for the runtime PR so we could see the full perf gains. But since it's taking time to get a review we can just merge this since it's still goodness.
Command 'run
Was' is not supported by Azure Pipelines.
Supported commands
- help:
- Get descriptions, examples and documentation about supported commands
- Example: help "command_name"
- list:
- List all pipelines for this repository using a comment.
- Example: "list"
- run:
- Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
- Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
- where:
- Report back the Azure DevOps orgs that are related to this repository and org
- Example: "where"
See additional documentation.
/azp run
Azure Pipelines successfully started running 3 pipeline(s).
@BrennanConroy did we ever re-run the benchmarks?
Not 100% apples-to-apples comparison since I'd need to run against an older .NET 10 runtime. But I ran the alternate lookup code on 9.0 vs. 10.0 p4 and 10.0 is about 2x faster than the 9.0 version.
net9.0
| Method | Count | Mean | Error | StdDev | Ratio | RatioSD |
|---|---|---|---|---|---|---|
| Baseline | 2 | 0.8147 ns | 0.0112 ns | 0.0105 ns | 1.00 | 0.00 |
| Dictionary | 2 | 18.1174 ns | 0.1606 ns | 0.1503 ns | 22.24 | 0.33 |
| Baseline | 5 | 0.7959 ns | 0.0144 ns | 0.0121 ns | 1.00 | 0.00 |
| Dictionary | 5 | 20.7396 ns | 0.2838 ns | 0.2216 ns | 26.08 | 0.52 |
| Baseline | 10 | 0.8101 ns | 0.0164 ns | 0.0176 ns | 1.00 | 0.00 |
| Dictionary | 10 | 20.7011 ns | 0.1297 ns | 0.1083 ns | 25.40 | 0.61 |
| Baseline | 25 | 0.8065 ns | 0.0165 ns | 0.0146 ns | 1.00 | 0.00 |
| Dictionary | 25 | 20.7905 ns | 0.3887 ns | 0.3446 ns | 25.79 | 0.72 |
| Baseline | 50 | 0.7988 ns | 0.0063 ns | 0.0059 ns | 1.00 | 0.00 |
| Dictionary | 50 | 23.0723 ns | 0.3425 ns | 0.2674 ns | 28.95 | 0.28 |
| Baseline | 100 | 0.8038 ns | 0.0160 ns | 0.0134 ns | 1.00 | 0.00 |
| Dictionary | 100 | 26.6376 ns | 0.1442 ns | 0.1278 ns | 33.17 | 0.55 |
net10
| Method | Count | Mean | Error | StdDev | Ratio | RatioSD |
|---|---|---|---|---|---|---|
| Baseline | 2 | 0.7341 ns | 0.0063 ns | 0.0059 ns | 1.00 | 0.00 |
| Dictionary | 2 | 8.7388 ns | 0.1193 ns | 0.1116 ns | 11.91 | 0.22 |
| Baseline | 5 | 0.9253 ns | 0.0117 ns | 0.0098 ns | 1.00 | 0.00 |
| Dictionary | 5 | 9.2274 ns | 0.0306 ns | 0.0272 ns | 9.97 | 0.09 |
| Baseline | 10 | 0.9326 ns | 0.0095 ns | 0.0074 ns | 1.00 | 0.00 |
| Dictionary | 10 | 10.0774 ns | 0.1035 ns | 0.0968 ns | 10.81 | 0.14 |
| Baseline | 25 | 0.9305 ns | 0.0188 ns | 0.0184 ns | 1.00 | 0.00 |
| Dictionary | 25 | 11.5418 ns | 0.0854 ns | 0.0667 ns | 12.35 | 0.25 |
| Baseline | 50 | 0.9207 ns | 0.0041 ns | 0.0040 ns | 1.00 | 0.00 |
| Dictionary | 50 | 13.8788 ns | 0.0925 ns | 0.0866 ns | 15.08 | 0.07 |
| Baseline | 100 | 0.9530 ns | 0.0192 ns | 0.0228 ns | 1.00 | 0.00 |
| Dictionary | 100 | 18.2492 ns | 0.1495 ns | 0.1398 ns | 19.13 | 0.55 |