[Fix](Schema Change ) fix schema change fail as internal sorting will no change to run
Proposed changes
when doing sorting schema change, the memory limit for changer and internal sorting now is memory limit of schema change task, witch will case the internal sorting no chance to run, and the schema change failed.
this pr try to limit the changer and internal sorting to std::min(0.5*memory_limit_of_schema_change_per_thread, memory_limitation_per_thread_for_schema_change_internal_sorting_bytes) to let the internal sorting and changer have enough memory to run.
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
TPC-H: Total hot run time: 49891 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false
------ Round 1 ----------------------------------
q1 18076 4499 4362 4362
q2 2062 158 151 151
q3 10437 2033 1892 1892
q4 10263 1270 1373 1270
q5 8584 3910 3937 3910
q6 232 121 123 121
q7 2033 1603 1615 1603
q8 9336 2762 2725 2725
q9 10698 10341 10285 10285
q10 8758 3537 3521 3521
q11 425 235 254 235
q12 463 298 307 298
q13 18371 3948 3982 3948
q14 349 324 323 323
q15 515 469 459 459
q16 687 579 565 565
q17 1163 972 994 972
q18 7320 6891 7010 6891
q19 1711 1598 1553 1553
q20 559 326 285 285
q21 4508 4135 4181 4135
q22 506 389 387 387
Total cold run time: 117056 ms
Total hot run time: 49891 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4378 4320 4316 4316
q2 319 230 228 228
q3 4164 4125 4120 4120
q4 2764 2774 2754 2754
q5 7228 7172 7079 7079
q6 239 118 121 118
q7 3267 2793 2857 2793
q8 4403 4515 4511 4511
q9 16796 16745 16870 16745
q10 4241 4249 4268 4249
q11 745 718 673 673
q12 1028 846 845 845
q13 7277 3742 3781 3742
q14 442 433 413 413
q15 505 463 465 463
q16 736 690 684 684
q17 3919 3841 3855 3841
q18 8902 8810 8871 8810
q19 1732 1707 1667 1667
q20 2361 2118 2093 2093
q21 8428 8513 8468 8468
q22 1019 938 932 932
Total cold run time: 84893 ms
Total hot run time: 79544 ms
TeamCity be ut coverage result: Function Coverage: 37.86% (8140/21500) Line Coverage: 29.60% (66989/226333) Region Coverage: 29.09% (34562/118811) Branch Coverage: 25.00% (17808/71244) Coverage Report: http://coverage.selectdb-in.cc/coverage/09f2604941de989e80f7d139a08a2d8e0c8a50df_09f2604941de989e80f7d139a08a2d8e0c8a50df/report/index.html
TPC-DS: Total hot run time: 202783 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false
query1 945 395 415 395
query2 6540 2233 2193 2193
query3 6920 212 203 203
query4 21325 18132 18085 18085
query5 19743 6541 6551 6541
query6 292 216 238 216
query7 4150 308 321 308
query8 268 262 230 230
query9 3117 2745 2621 2621
query10 435 290 305 290
query11 11415 10730 10708 10708
query12 131 80 74 74
query13 5598 651 665 651
query14 18009 13766 13013 13013
query15 366 214 233 214
query16 6458 285 261 261
query17 1706 1453 872 872
query18 2316 410 417 410
query19 211 155 151 151
query20 82 78 80 78
query21 189 96 91 91
query22 5215 5063 4860 4860
query23 32708 31910 31980 31910
query24 7015 6517 6465 6465
query25 515 426 413 413
query26 527 165 167 165
query27 1915 297 305 297
query28 6065 2294 2251 2251
query29 2933 2713 2885 2713
query30 241 168 166 166
query31 917 770 773 770
query32 71 68 64 64
query33 387 264 253 253
query34 860 481 481 481
query35 1132 893 929 893
query36 1325 1254 1138 1138
query37 89 64 60 60
query38 3071 2926 2972 2926
query39 1375 1330 1322 1322
query40 212 98 95 95
query41 42 37 39 37
query42 86 88 86 86
query43 649 615 580 580
query44 1140 719 723 719
query45 247 233 230 230
query46 1237 967 985 967
query47 1971 1772 1663 1663
query48 980 676 653 653
query49 638 376 376 376
query50 876 613 621 613
query51 4715 4718 4584 4584
query52 97 87 86 86
query53 444 323 326 323
query54 2668 2464 2469 2464
query55 95 82 88 82
query56 233 235 206 206
query57 1136 1114 1092 1092
query58 223 213 216 213
query59 3553 3363 3460 3363
query60 218 212 202 202
query61 96 92 93 92
query62 822 430 487 430
query63 484 364 347 347
query64 2573 1526 1379 1379
query65 3591 3555 3565 3555
query66 815 379 387 379
query67 16221 16911 16096 16096
query68 8373 652 654 652
query69 579 357 395 357
query70 1651 1322 1433 1322
query71 404 312 329 312
query72 6511 3512 3506 3506
query73 740 333 330 330
query74 6329 5897 5900 5897
query75 4811 3729 3720 3720
query76 4867 1157 1216 1157
query77 693 261 268 261
query78 12677 11730 18466 11730
query79 6427 664 651 651
query80 1712 405 414 405
query81 505 241 232 232
query82 269 98 101 98
query83 177 136 142 136
query84 262 74 70 70
query85 1167 319 324 319
query86 348 294 334 294
query87 3255 3041 3009 3009
query88 4322 2378 2388 2378
query89 345 285 304 285
query90 1827 216 214 214
query91 163 122 128 122
query92 61 53 56 53
query93 924 564 609 564
query94 811 215 214 214
query95 1146 1099 1059 1059
query96 640 336 339 336
query97 6530 6380 6395 6380
query98 194 173 179 173
query99 2940 932 883 883
Total cold run time: 304299 ms
Total hot run time: 202783 ms
ClickBench: Total hot run time: 31.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df, data reload: false
query1 0.02 0.02 0.02
query2 0.07 0.02 0.03
query3 0.24 0.04 0.05
query4 1.80 0.06 0.07
query5 0.53 0.52 0.52
query6 1.24 0.60 0.61
query7 0.02 0.01 0.01
query8 0.04 0.03 0.02
query9 0.52 0.50 0.46
query10 0.53 0.55 0.54
query11 0.11 0.09 0.08
query12 0.12 0.09 0.08
query13 0.63 0.62 0.63
query14 0.80 0.78 0.78
query15 0.79 0.75 0.76
query16 0.36 0.35 0.39
query17 1.01 1.02 1.03
query18 0.21 0.27 0.23
query19 1.92 1.87 1.79
query20 0.02 0.01 0.01
query21 15.47 0.62 0.56
query22 2.51 2.06 1.74
query23 17.43 0.98 0.82
query24 4.68 3.50 1.57
query25 0.42 0.13 0.05
query26 0.76 0.16 0.16
query27 0.04 0.04 0.03
query28 5.44 0.76 0.77
query29 12.81 2.47 2.38
query30 0.56 0.54 0.54
query31 2.80 0.40 0.38
query32 3.34 0.49 0.50
query33 3.07 3.07 3.05
query34 15.24 4.79 4.82
query35 4.86 4.87 4.84
query36 1.06 1.02 1.02
query37 0.06 0.05 0.04
query38 0.04 0.02 0.02
query39 0.02 0.02 0.01
query40 0.16 0.14 0.14
query41 0.07 0.01 0.02
query42 0.02 0.01 0.02
query43 0.02 0.01 0.02
Total cold run time: 101.86 s
Total hot run time: 31.42 s
Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Load test result on commit 09f2604941de989e80f7d139a08a2d8e0c8a50df with default session variables
Stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc: 58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select: 21.2 seconds inserted 10000000 Rows, about 471K ops/s
close it as https://github.com/apache/doris/pull/39995 is fixed
fixed in #39995
as pr #39995 not fix the schema change failure totally, in our env, the failure still exist for some big table even change the config: memory_limitation_per_thread_for_schema_change_bytes . so reopen this, as this pr introduce another config: memory_limitation_per_thread_for_schema_change_internal_sorting_bytes to limit the memory usage in sorting schema change.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!