More precise writebarrier for regions
This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.
The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.
The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).
I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.
I haven't implemented yet committing only the part of the lookup table that is needed.
Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.
Issue Details
This introduces a lookup table for regions where we can find the current generation and the planned generation efficiently.
The table has byte-sized elements where the low nibble is the current generation and the high nibble is the planned generation.
The table is used in mark_through_cards_helper and in the write barriers (for now only the most frequently used ones, Array.Copy has its own way of setting cards that I haven't fixed).
I have changed the write barrier to only set single bits for the case where a pointer to younger generation is stored into an object in an older generation. This costs an interlocked operation in the case the bit is not already set. Hopefully though this will be more than compensated by lower cost in card marking.
I haven't implemented yet committing only the part of the lookup table that is needed.
| Author: | PeterSolMS |
|---|---|
| Assignees: | PeterSolMS |
| Labels: |
|
| Milestone: | - |
running on a 1st party prod workload -
| index | Baseline | New | Diff | Diff % | |
|---|---|---|---|---|---|
| 3 | Process Duration (Sec) | 53,887.63 | 53,858.88 | -28.751 | -0.053 |
| 4 | Total Allocated MB | 16,252,981.58 | 15,819,053.88 | -433,927.69 | -2.67 |
| 5 | Max Size Peak MB | 26,878.44 | 27,017.04 | 138.607 | 0.516 |
| 6 | GC Count | 7,269.00 | 7,573.00 | 304 | 4.182 |
| 7 | Heap Count | 48 | 48 | 0 | 0 |
| 8 | Gen0 Count | 3,630.00 | 3,779.00 | 149 | 4.105 |
| 9 | Gen1 Count | 3,466.00 | 3,621.00 | 155 | 4.472 |
| 10 | Ephemeral Count | 7,096.00 | 7,400.00 | 304 | 4.284 |
| 11 | Gen2 Blocking Count | 4 | 4 | 0 | 0 |
| 12 | BGC Count | 169 | 169 | 0 | 0 |
| 13 | Gen0 Total Pause Time MSec | 269,123.47 | 231,960.87 | -37,162.60 | -13.81 |
| 14 | Gen1 Total Pause Time MSec | 355,468.13 | 337,579.93 | -17,888.20 | -5.032 |
| 15 | Ephemeral Total Pause Time MSec | 624,591.60 | 569,540.80 | -55,050.80 | -8.814 |
| 16 | Blocking Gen2 Total Pause Time MSec | 4,260.20 | 2,376.57 | -1,883.63 | -44.22 |
| 17 | BGC Total Pause Time MSec | 14,630.25 | 13,270.05 | -1,360.20 | -9.297 |
| 18 | GC Pause Time % | 1.194 | 1.087 | -0.108 | -9.011 |
| 19 | Avg. Gen0 Pause Time (ms) | 74.139 | 61.382 | -12.757 | -17.21 |
| 20 | Avg. Gen1 Pause Time (ms) | 102.559 | 93.228 | -9.33 | -9.097 |
| 21 | Avg. Gen0 Promoted (mb) | 170.597 | 166.073 | -4.524 | -2.652 |
| 22 | Avg. Gen1 Promoted (mb) | 343.586 | 332.5 | -11.087 | -3.227 |
| 23 | Avg. Gen0 Speed (mb/ms) | 2.301 | 2.706 | 0.405 | 17.58 |
| 24 | Avg. Gen1 Speed (mb/ms) | 3.35 | 3.567 | 0.216 | 6.458 |
looking at 500 GCs during steady state as an example -

This improved crossgen2 throughput:
