runtime icon indicating copy to clipboard operation
runtime copied to clipboard

More precise writebarrier for regions

Open PeterSolMS opened this issue 3 years ago • 2 comments

This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.

The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.

The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).

I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.

I haven't implemented yet committing only the part of the lookup table that is needed.

PeterSolMS avatar Mar 31 '22 16:03 PeterSolMS

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details

This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.

The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.

The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).

I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.

I haven't implemented yet committing only the part of the lookup table that is needed.

Author: PeterSolMS
Assignees: PeterSolMS
Labels:

area-GC-coreclr

Milestone: -

msftbot[bot] avatar Mar 31 '22 16:03 msftbot[bot]

running on a 1st party prod workload -

index   Baseline New Diff Diff %
3 Process Duration (Sec) 53,887.63 53,858.88 -28.751 -0.053
4 Total Allocated MB 16,252,981.58 15,819,053.88 -433,927.69 -2.67
5 Max Size Peak MB 26,878.44 27,017.04 138.607 0.516
6 GC Count 7,269.00 7,573.00 304 4.182
7 Heap Count 48 48 0 0
8 Gen0 Count 3,630.00 3,779.00 149 4.105
9 Gen1 Count 3,466.00 3,621.00 155 4.472
10 Ephemeral Count 7,096.00 7,400.00 304 4.284
11 Gen2 Blocking Count 4 4 0 0
12 BGC Count 169 169 0 0
13 Gen0 Total Pause Time MSec 269,123.47 231,960.87 -37,162.60 -13.81
14 Gen1 Total Pause Time MSec 355,468.13 337,579.93 -17,888.20 -5.032
15 Ephemeral Total Pause Time MSec 624,591.60 569,540.80 -55,050.80 -8.814
16 Blocking Gen2 Total Pause Time MSec 4,260.20 2,376.57 -1,883.63 -44.22
17 BGC Total Pause Time MSec 14,630.25 13,270.05 -1,360.20 -9.297
18 GC Pause Time % 1.194 1.087 -0.108 -9.011
19 Avg. Gen0 Pause Time (ms) 74.139 61.382 -12.757 -17.21
20 Avg. Gen1 Pause Time (ms) 102.559 93.228 -9.33 -9.097
21 Avg. Gen0 Promoted (mb) 170.597 166.073 -4.524 -2.652
22 Avg. Gen1 Promoted (mb) 343.586 332.5 -11.087 -3.227
23 Avg. Gen0 Speed (mb/ms) 2.301 2.706 0.405 17.58
24 Avg. Gen1 Speed (mb/ms) 3.35 3.567 0.216 6.458

looking at 500 GCs during steady state as an example -

image

Maoni0 avatar Jul 07 '22 06:07 Maoni0

This improved crossgen2 throughput: newplot - 2022-09-08T093600 033

AndyAyersMS avatar Sep 08 '22 16:09 AndyAyersMS