client_java icon indicating copy to clipboard operation
client_java copied to clipboard

Zero gc labels lookup

Open brian-brazil opened this issue 6 years ago • 2 comments

@njhill @franz1981 @Falland

I've finally had time to dig into the various PRs (#445 #459 #460), and what I've done is taken the benchmarks from 460 and used 445 as a base and then eliminated the remaining allocs from it. 460 still has notably better performance, however I remain uncomfortable with a bespoke hashtable implementation and Java's profiling tools aren't giving me enough to figure out why it's slower.

As a side effect, it's also simpler to add other numbers of labels in the future (not that even 4 should be common in the first place).

brian-brazil avatar Jun 19 '19 13:06 brian-brazil

Existing:

Benchmark                                                                  Mode  Cnt     Score   Error   Units
LabelsToChildLookupBenchmark.baseline                                      avgt         11.091           ns/op
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate                       avgt          0.001          MB/sec
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate.norm                  avgt         ≈ 10⁻⁵            B/op
LabelsToChildLookupBenchmark.baseline:·gc.count                            avgt            ≈ 0          counts
LabelsToChildLookupBenchmark.fiveLabels                                    avgt         75.175           ns/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate                     avgt       1636.069          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate.norm                avgt        128.000            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space            avgt       1794.610          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space.norm       avgt        140.404            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space        avgt          0.031          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space.norm   avgt          0.002            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.count                          avgt          3.000          counts
LabelsToChildLookupBenchmark.fiveLabels:·gc.time                           avgt          3.000              ms
LabelsToChildLookupBenchmark.fourLabels                                    avgt         52.120           ns/op
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate                     avgt       1021.469          MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate.norm                avgt         56.000            B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Eden_Space            avgt       1003.297          MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Eden_Space.norm       avgt         55.004            B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Survivor_Space        avgt          0.062          MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.churn.PS_Survivor_Space.norm   avgt          0.003            B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.count                          avgt          2.000          counts
LabelsToChildLookupBenchmark.fourLabels:·gc.time                           avgt          2.000              ms
LabelsToChildLookupBenchmark.oneLabel                                      avgt         36.335           ns/op
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate                       avgt       1267.213          MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate.norm                  avgt         48.000            B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Eden_Space              avgt        998.861          MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Eden_Space.norm         avgt         37.835            B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Survivor_Space          avgt          0.031          MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.churn.PS_Survivor_Space.norm     avgt          0.001            B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.count                            avgt          2.000          counts
LabelsToChildLookupBenchmark.oneLabel:·gc.time                             avgt          2.000              ms
LabelsToChildLookupBenchmark.threeLabels                                   avgt         41.139           ns/op
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate                    avgt       1296.388          MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate.norm               avgt         56.000            B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Eden_Space           avgt       1006.037          MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Eden_Space.norm      avgt         43.458            B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Survivor_Space       avgt          0.062          MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.churn.PS_Survivor_Space.norm  avgt          0.003            B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.count                         avgt          2.000          counts
LabelsToChildLookupBenchmark.threeLabels:·gc.time                          avgt          1.000              ms
LabelsToChildLookupBenchmark.twoLabels                                     avgt         44.241           ns/op
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate                      avgt       1031.930          MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate.norm                 avgt         48.000            B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Eden_Space             avgt       1004.470          MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Eden_Space.norm        avgt         46.723            B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Survivor_Space         avgt          0.031          MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.churn.PS_Survivor_Space.norm    avgt          0.001            B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.count                           avgt          2.000          counts
LabelsToChildLookupBenchmark.twoLabels:·gc.time                            avgt          2.000              ms

Now:

-wi 1 -i 1 -f 1 -t 2 -prof gc

Benchmark                                                                 Mode  Cnt    Score   Error   Units
LabelsToChildLookupBenchmark.baseline                                     avgt        11.743           ns/op
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate                      avgt         0.001          MB/sec
LabelsToChildLookupBenchmark.baseline:·gc.alloc.rate.norm                 avgt        ≈ 10⁻⁵            B/op
LabelsToChildLookupBenchmark.baseline:·gc.count                           avgt           ≈ 0          counts
LabelsToChildLookupBenchmark.fiveLabels                                   avgt        68.673           ns/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate                    avgt       562.481          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.alloc.rate.norm               avgt        40.000            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space           avgt       501.133          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Eden_Space.norm      avgt        35.637            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space       avgt         1.336          MB/sec
LabelsToChildLookupBenchmark.fiveLabels:·gc.churn.PS_Survivor_Space.norm  avgt         0.095            B/op
LabelsToChildLookupBenchmark.fiveLabels:·gc.count                         avgt         2.000          counts
LabelsToChildLookupBenchmark.fiveLabels:·gc.time                          avgt         2.000              ms
LabelsToChildLookupBenchmark.fourLabels                                   avgt        48.986           ns/op
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate                    avgt         0.001          MB/sec
LabelsToChildLookupBenchmark.fourLabels:·gc.alloc.rate.norm               avgt        ≈ 10⁻⁴            B/op
LabelsToChildLookupBenchmark.fourLabels:·gc.count                         avgt           ≈ 0          counts
LabelsToChildLookupBenchmark.oneLabel                                     avgt        34.478           ns/op
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate                      avgt         0.001          MB/sec
LabelsToChildLookupBenchmark.oneLabel:·gc.alloc.rate.norm                 avgt        ≈ 10⁻⁴            B/op
LabelsToChildLookupBenchmark.oneLabel:·gc.count                           avgt           ≈ 0          counts
LabelsToChildLookupBenchmark.threeLabels                                  avgt        41.966           ns/op
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate                   avgt         0.001          MB/sec
LabelsToChildLookupBenchmark.threeLabels:·gc.alloc.rate.norm              avgt        ≈ 10⁻⁴            B/op
LabelsToChildLookupBenchmark.threeLabels:·gc.count                        avgt           ≈ 0          counts
LabelsToChildLookupBenchmark.twoLabels                                    avgt        43.499           ns/op
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate                     avgt         0.001          MB/sec
LabelsToChildLookupBenchmark.twoLabels:·gc.alloc.rate.norm                avgt        ≈ 10⁻⁴            B/op
LabelsToChildLookupBenchmark.twoLabels:·gc.count                          avgt           ≈ 0          counts

brian-brazil avatar Jun 19 '19 13:06 brian-brazil

@brian-brazil I've opened #514, see what you think. The thread-local pooling logic in this PR could easily be encapsulated as an implementation of ConcurrentChildMap.

njhill avatar Nov 22 '19 06:11 njhill