Zero gc labels lookup
@njhill @franz1981 @Falland
I've finally had time to dig into the various PRs (#445 #459 #460), and what I've done is taken the benchmarks from 460 and used 445 as a base and then eliminated the remaining allocs from it. 460 still has notably better performance, however I remain uncomfortable with a bespoke hashtable implementation and Java's profiling tools aren't giving me enough to figure out why it's slower.
As a side effect, it's also simpler to add other numbers of labels in the future (not that even 4 should be common in the first place).
Existing:
Benchmark Mode Cnt Score Error Units
LabelsToChildLookupBenchmark.baseline avgt 11.091 ns/op
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate.norm avgt ≈ 10⁻⁵ B/op
LabelsToChildLookupBenchmark.baseline:·gc.count avgt ≈ 0 counts
LabelsToChildLookupBenchmark.fiveLabels avgt 75.175 ns/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate avgt 1636.069 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate.norm avgt 128.000 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space avgt 1794.610 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space.norm avgt 140.404 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space avgt 0.031 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space.norm avgt 0.002 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.count avgt 3.000 counts
LabelsToChildLookupBenchmark.fiveLabels:·gc.time avgt 3.000 ms
LabelsToChildLookupBenchmark.fourLabels avgt 52.120 ns/op
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate avgt 1021.469 MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate.norm avgt 56.000 B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Eden_Space avgt 1003.297 MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Eden_Space.norm avgt 55.004 B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Survivor_Space avgt 0.062 MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Survivor_Space.norm avgt 0.003 B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.count avgt 2.000 counts
LabelsToChildLookupBenchmark.fourLabels:·gc.time avgt 2.000 ms
LabelsToChildLookupBenchmark.oneLabel avgt 36.335 ns/op
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate avgt 1267.213 MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate.norm avgt 48.000 B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Eden_Space avgt 998.861 MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Eden_Space.norm avgt 37.835 B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Survivor_Space avgt 0.031 MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Survivor_Space.norm avgt 0.001 B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.count avgt 2.000 counts
LabelsToChildLookupBenchmark.oneLabel:·gc.time avgt 2.000 ms
LabelsToChildLookupBenchmark.threeLabels avgt 41.139 ns/op
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate avgt 1296.388 MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate.norm avgt 56.000 B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Eden_Space avgt 1006.037 MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Eden_Space.norm avgt 43.458 B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Survivor_Space avgt 0.062 MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Survivor_Space.norm avgt 0.003 B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.count avgt 2.000 counts
LabelsToChildLookupBenchmark.threeLabels:·gc.time avgt 1.000 ms
LabelsToChildLookupBenchmark.twoLabels avgt 44.241 ns/op
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate avgt 1031.930 MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate.norm avgt 48.000 B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Eden_Space avgt 1004.470 MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Eden_Space.norm avgt 46.723 B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Survivor_Space avgt 0.031 MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Survivor_Space.norm avgt 0.001 B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.count avgt 2.000 counts
LabelsToChildLookupBenchmark.twoLabels:·gc.time avgt 2.000 ms
Now:
-wi 1 -i 1 -f 1 -t 2 -prof gc
Benchmark Mode Cnt Score Error Units
LabelsToChildLookupBenchmark.baseline avgt 11.743 ns/op
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate.norm avgt ≈ 10⁻⁵ B/op
LabelsToChildLookupBenchmark.baseline:·gc.count avgt ≈ 0 counts
LabelsToChildLookupBenchmark.fiveLabels avgt 68.673 ns/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate avgt 562.481 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate.norm avgt 40.000 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space avgt 501.133 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space.norm avgt 35.637 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space avgt 1.336 MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space.norm avgt 0.095 B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.count avgt 2.000 counts
LabelsToChildLookupBenchmark.fiveLabels:·gc.time avgt 2.000 ms
LabelsToChildLookupBenchmark.fourLabels avgt 48.986 ns/op
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate.norm avgt ≈ 10⁻⁴ B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.count avgt ≈ 0 counts
LabelsToChildLookupBenchmark.oneLabel avgt 34.478 ns/op
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate.norm avgt ≈ 10⁻⁴ B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.count avgt ≈ 0 counts
LabelsToChildLookupBenchmark.threeLabels avgt 41.966 ns/op
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate.norm avgt ≈ 10⁻⁴ B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.count avgt ≈ 0 counts
LabelsToChildLookupBenchmark.twoLabels avgt 43.499 ns/op
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate avgt 0.001 MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate.norm avgt ≈ 10⁻⁴ B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.count avgt ≈ 0 counts
@brian-brazil I've opened #514, see what you think. The thread-local pooling logic in this PR could easily be encapsulated as an implementation of ConcurrentChildMap.