Vectorize IndexOfAnyExcept<T>(T value)
Contributes to https://github.com/dotnet/runtime/issues/67942
[Params(1, 4, 16, 64, 256, 1024)]
public int Length { get; set; }
private byte[] _zeros;
[GlobalSetup]
public void Setup() => _zeros = new byte[Length];
[Benchmark]
public bool AllZeroTrue() => _zeros.AsSpan().IndexOfAnyExcept((byte)0) < 0;
| Method | Toolchain | Length | Mean | Ratio | Code Size |
|---|---|---|---|---|---|
| AllZeroTrue | \main\corerun.exe | 1 | 0.6482 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 1 | 1.7013 ns | 2.62 | 242 B |
| AllZeroTrue | \main\corerun.exe | 4 | 2.6035 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 4 | 3.5931 ns | 1.38 | 242 B |
| AllZeroTrue | \main\corerun.exe | 16 | 4.8521 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 16 | 2.3925 ns | 0.49 | 242 B |
| AllZeroTrue | \main\corerun.exe | 64 | 18.6816 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 64 | 3.5615 ns | 0.19 | 242 B |
| AllZeroTrue | \main\corerun.exe | 256 | 81.8503 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 256 | 10.3610 ns | 0.13 | 242 B |
| AllZeroTrue | \main\corerun.exe | 1024 | 310.6106 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 1024 | 37.1116 ns | 0.12 | 242 B |
The one thing that makes me a tad hesitant is the case of the except value being found at the very beginning, in which case this does incur a (very small) penalty even for really long inputs, e.g.
[Params(1024)]
public int Length { get; set; }
private byte[] _allBitsSet;
[GlobalSetup]
public void Setup() => _allBitsSet = Enumerable.Repeat((byte)0xFF, Length).ToArray();
[Benchmark]
public bool AllBitsSetLookingForNon0() => _allBitsSet.AsSpan().IndexOfAnyExcept((byte)0) < 0;
| Method | Toolchain | Length | Mean | Ratio | RatioSD | Code Size |
|---|---|---|---|---|---|---|
| AllBitsSetLookingForNon0 | \main\corerun.exe | 1024 | 0.4979 ns | 1.00 | 0.00 | 72 B |
| AllBitsSetLookingForNon0 | \pr\corerun.exe | 1024 | 1.7421 ns | 3.51 | 0.11 | 242 B |
I'm not sure what if anything to do about it.
Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Issue Details
Contributes to https://github.com/dotnet/runtime/issues/67942
[Params(1, 4, 16, 64, 256, 1024)]
public int Length { get; set; }
private byte[] _zeros;
[GlobalSetup]
public void Setup() => _zeros = new byte[Length];
[Benchmark]
public bool AllZeroTrue() => _zeros.AsSpan().IndexOfAnyExcept((byte)0) < 0;
| Method | Toolchain | Length | Mean | Ratio | Code Size |
|---|---|---|---|---|---|
| AllZeroTrue | \main\corerun.exe | 1 | 0.6482 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 1 | 1.7013 ns | 2.62 | 242 B |
| AllZeroTrue | \main\corerun.exe | 4 | 2.6035 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 4 | 3.5931 ns | 1.38 | 242 B |
| AllZeroTrue | \main\corerun.exe | 16 | 4.8521 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 16 | 2.3925 ns | 0.49 | 242 B |
| AllZeroTrue | \main\corerun.exe | 64 | 18.6816 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 64 | 3.5615 ns | 0.19 | 242 B |
| AllZeroTrue | \main\corerun.exe | 256 | 81.8503 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 256 | 10.3610 ns | 0.13 | 242 B |
| AllZeroTrue | \main\corerun.exe | 1024 | 310.6106 ns | 1.00 | 72 B |
| AllZeroTrue | \pr\corerun.exe | 1024 | 37.1116 ns | 0.12 | 242 B |
The one thing that makes me a tad hesitant is the case of the except value being found at the very beginning, in which case this does incur a (very small) penalty even for really long inputs, e.g.
[Params(1024)]
public int Length { get; set; }
private byte[] _allBitsSet;
[GlobalSetup]
public void Setup() => _allBitsSet = Enumerable.Repeat((byte)0xFF, Length).ToArray();
[Benchmark]
public bool AllBitsSetLookingForNon0() => _allBitsSet.AsSpan().IndexOfAnyExcept((byte)0) < 0;
| Method | Toolchain | Length | Mean | Ratio | RatioSD | Code Size |
|---|---|---|---|---|---|---|
| AllBitsSetLookingForNon0 | \main\corerun.exe | 1024 | 0.4979 ns | 1.00 | 0.00 | 72 B |
| AllBitsSetLookingForNon0 | \pr\corerun.exe | 1024 | 1.7421 ns | 3.51 | 0.11 | 242 B |
I'm not sure what if anything to do about it.
| Author: | stephentoub |
|---|---|
| Assignees: | - |
| Labels: |
|
| Milestone: | 7.0.0 |
Failure is https://github.com/dotnet/runtime/issues/73247
@adamsitnik, can you please review this? Thanks.
Thanks for reviewing, Adam.
I'm not sure what if anything to do about it.
I think this is an acceptable tradeoff. The measured difference is just over 1ns which would be about 2-4 clock cycles on most modern computers, this is likely representative of the jump to IndexOfAnyExceptValueType and branch cost for checking the length.
In practice, the actual spilling of args, call, and general memory latency is more than this so most users likely won't see a difference. It's likewise a general avenue we might want to look at improved JIT support around, for being able to ensure that cost for such checks can be minimized.
One think that might help would be making it so that you have a pattern like the following and marking the containing method as AggressiveInlining
if (length < Vector128<T>.Count)
{
ScalarImpl();
}
else
{
VectorImpl();
}
This would theoretically allow the JIT to hoist, constant fold, or otherwise mitigate the branch for various input kinds.
this is likely representative of the jump to IndexOfAnyExceptValueType and branch cost for checking the length.
To be clear, I wasn't referring to the case where there are fewer than Count elements, rather the case where there are more than Count elements and the very first element doesn't match the specified value. I expect that to be a fairly common occurrence with some uses of this method.
Hm.. looks like this regressed System.Collections.Concurrent.IsEmpty<String>.Dictionary(Size: 512) but none regressions were filed

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Collections.Concurrent.IsEmpty(String).Dictionary(Size%3a%20512).html
(I blamed this PR because ConcurrentDictionary uses IndexOfAnyExcept for AreAllBucketsEmpty)
looks like this regressed System.Collections.Concurrent.IsEmpty<String>.Dictionary(Size: 512) but none regressions were filed
By a nanosecond or two, right? That's largely expected. I'd expect IsEmpty to get a tad slower when the dictionary isn't empty and faster when it is empty; basically an extra branch or two in the check.
I totally agree and didn't file an issue for it, it's just needed for perf study report