[Intel-SIG] Fix sched domain build error for GNR, CWF in SNC-3 mode
While testing Granite Rapids (GNR) and Clearwater Forest (CWF) systems in SNC-3 mode, we encountered sched domain build errors in dmesg. The scheduler domain code did not expect asymmetric node distances from a local node to multiple nodes in a remote package. As a result, remote nodes ended up being grouped partially with local nodes with asymemtric groupings, and creating too many levels in the NUMA sched domain hierarchy.
To address this, we simplify remote node distances for the purpose of sched domain construction on GNR and CWF. Specifically, we replace the individual distances to nodes within the same remote package with their average distance. This resolves the domain build errors and reduces the number of NUMA sched domain levels.
The actual SLIT NUMA node distances are still preserved separately, in case they are needed when building sched domains. NUMA balancing continues to use the true distances when selecting a closer remote node for a task’s numa_group.
The following two commits backported:
- 0002-sched-Create-architecture-specific-sched-domain-dist.patch
- 0003-sched-topology-Fix-sched-domain-build-error-for-GNR-.patch
as well as its necessary dependencies:
- 0001-bitmap-Define-a-cleanup-function-for-bitmaps.patch
Testing result w/o fixes:
[ 8.260954] CPU0 attaching sched-domain(s):
[ 8.261112] domain-0: span=0,192 level=SMT[ 8.262111] groups: 0:{ span=0 cap=976 }, 192:{ span=192 cap=1022 }
[ 8.263111] domain-1: span=0-31,192-223 level=MC
[ 8.264110] groups: 0:{ span=0,192 cap=1998 }, 1:{ span=1,193 cap=2046 }, 2:{ span=2,194 cap=2045 }, 3:{ span=3,195 cap=2046 }, 4:{ span=4,196 cap=2044 }, 5:{ span=5,197 cap=2045 }, 6:{ span=6,198 cap=2046 }, 7:{ span=7,199 cap=2045 }, 8:{ span=8,200 cap=2045 }, 9:{ span=9,201 cap=2047 }, 10:{ span=10,202 cap=2045 }, 11:{ span=11,203 cap=2047 }, 12:{ span=12,204 cap=2044 }, 13:{ span=13,205 cap=2045 }, 14:{ span=14,206 cap=2045 }, 15:{ span=15,207 cap=2045 }, 16:{ span=16,208 cap=2045 }, 17:{ span=17,209 cap=2048 }, 18:{ span=18,210 cap=2047 }, 19:{ span=19,211 cap=2045 }, 20:{ span=20,212 cap=2045 }, 21:{ span=21,213 cap=2046 }, 22:{ span=22,214 cap=2048 }, 23:{ span=23,215 cap=2045 }, 24:{ span=24,216 cap=2047 }, 25:{ span=25,217 cap=2046 }, 26:{ span=26,218 cap=2046 }, 27:{ span=27,219 cap=2045 }, 28:{ span=28,220 cap=2046 }, 29:{ span=29,221 cap=2046 }, 30:{ span=30,222 cap=2044 }, 31:{ span=31,223 cap=2046 }
[ 8.265119] domain-2: span=0-63,192-255 level=NUMA
[ 8.266110] groups: 0:{ span=0-31,192-223 cap=65413 }, 32:{ span=32-63,224-255 cap=65457 }
[ 8.267111] domain-3: span=0-95,192-287 level=NUMA
[ 8.268110] groups: 0:{ span=0-63,192-255 mask=0-31,192-223 cap=130870 }, 64:{ span=32-95,224-287 mask=64-95,256-287 cap=131001 }
[ 8.269111] domain-4: span=0-127,192-319 level=NUMA
[ 8.270110] groups: 0:{ span=0-95,192-287 cap=196381 }, 96:{ span=96-127,288-319 cap=65451 }
[ 8.271111] domain-5: span=0-127,160-319,352-383 level=NUMA
[ 8.272110] groups: 0:{ span=0-127,192-319 mask=0-31,192-223 cap=261832 }, 160:{ span=160-191,352-383 cap=65475 }
[ 8.273112] domain-6: span=0-383 level=NUMA
[ 8.274110] groups: 0:{ span=0-127,160-319,352-383 mask=0-31,192-223 cap=327307 }
[ 8.275111] ERROR: groups don't span domain->span
Testing result w/ fixes:
[ 8.187368] CPU0 attaching sched-domain(s):
[ 8.188143] domain-0: span=0,192 level=SMT
[ 8.189142] groups: 0:{ span=0 cap=887 }, 192:{ span=192 }
[ 8.190141] domain-1: span=0-31,192-223 level=MC
[ 8.191141] groups: 0:{ span=0,192 cap=1911 }, 1:{ span=1,193 cap=2021 }, 2:{ span=2,194 cap=2038 }, 3:{ span=3,195 cap=2040 }, 4:{ span=4,196 cap=2039 }, 5:{ span=5,197 cap=2045 }, 6:{ span=6,198 cap=2041 }, 7:{ span=7,199 cap=2041 }, 8:{ span=8,200 cap=2042 }, 9:{ span=9,201 cap=2033 }, 10:{ span=10,202 cap=2033 }, 11:{ span=11,203 cap=2033 }, 12:{ span=12,204 cap=2045 }, 13:{ span=13,205 cap=2027 }, 14:{ span=14,206 cap=2038 }, 15:{ span=15,207 cap=2035 }, 16:{ span=16,208 cap=2044 }, 17:{ span=17,209 cap=2044 }, 18:{ span=18,210 cap=2039 }, 19:{ span=19,211 cap=2042 }, 20:{ span=20,212 cap=2041 }, 21:{ span=21,213 cap=2048 }, 22:{ span=22,214 cap=2036 }, 23:{ span=23,215 cap=2048 }, 24:{ span=24,216 cap=2021 }, 25:{ span=25,217 cap=2043 }, 26:{ span=26,218 cap=2044 }, 27:{ span=27,219 cap=2041 }, 28:{ span=28,220 cap=2041 }, 29:{ span=29,221 cap=2037 }, 30:{ span=30,222 cap=2036 }, 31:{ span=31,223 cap=2048 }
[ 8.192149] domain-2: span=0-63,192-255 level=NUMA
[ 8.193141] groups: 0:{ span=0-31,192-223 cap=65115 }, 32:{ span=32-63,224-255 cap=65201 }
[ 8.194142] domain-3: span=0-95,192-287 level=NUMA
[ 8.195141] groups: 0:{ span=0-63,192-255 mask=0-31,192-223 cap=130316 }, 64:{ span=32-95,224-287 mask=64-95,256-287 cap=130714 }
[ 8.196142] domain-4: span=0-383 level=NUMA
[ 8.197141] groups: 0:{ span=0-95,192-287 cap=195692 }, 96:{ span=96-191,288-383 cap=195639 }