[opt](inverted index) the "unicode" tokenizer can be configured to disable stop words
Proposed changes
- properties: "parser" = "unicode", "use_stopwords" = "none" disable stop words.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TPC-H: Total hot run time: 40210 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c9bfa966b2fb67d9b8de0a5c39dc53ab57b8793b, data reload: false
------ Round 1 ----------------------------------
q1 18514 4570 4374 4374
q2 2534 197 195 195
q3 10995 1189 1181 1181
q4 10460 783 777 777
q5 7487 2805 2670 2670
q6 212 131 133 131
q7 1034 609 573 573
q8 9237 2132 2115 2115
q9 9084 6575 6533 6533
q10 8975 3703 3740 3703
q11 460 234 231 231
q12 482 220 217 217
q13 17774 2959 2953 2953
q14 257 214 217 214
q15 512 461 469 461
q16 506 389 369 369
q17 976 632 684 632
q18 8084 7432 7417 7417
q19 4400 1531 1525 1525
q20 659 318 315 315
q21 5051 3342 4007 3342
q22 353 296 282 282
Total cold run time: 118046 ms
Total hot run time: 40210 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4376 4230 4212 4212
q2 378 255 264 255
q3 2965 2780 2776 2776
q4 1866 1608 1608 1608
q5 5306 5300 5249 5249
q6 207 124 123 123
q7 2238 1926 1882 1882
q8 3194 3374 3384 3374
q9 8446 8480 8467 8467
q10 3915 3756 3640 3640
q11 572 475 473 473
q12 765 626 595 595
q13 16612 2979 2991 2979
q14 295 267 261 261
q15 508 470 470 470
q16 476 403 411 403
q17 1764 1486 1472 1472
q18 7705 7441 7483 7441
q19 2360 1565 1530 1530
q20 1966 1765 1753 1753
q21 5013 4751 4841 4751
q22 577 485 490 485
Total cold run time: 71504 ms
Total hot run time: 54199 ms
TPC-DS: Total hot run time: 184579 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c9bfa966b2fb67d9b8de0a5c39dc53ab57b8793b, data reload: false
query1 918 357 352 352
query2 6444 2404 2362 2362
query3 6658 212 216 212
query4 23434 21129 21085 21085
query5 4201 428 428 428
query6 270 182 185 182
query7 4602 294 295 294
query8 253 189 185 185
query9 8415 2408 2379 2379
query10 444 247 247 247
query11 14668 14118 14099 14099
query12 139 87 94 87
query13 1642 355 366 355
query14 9394 6540 8202 6540
query15 261 175 175 175
query16 8193 260 258 258
query17 1896 559 548 548
query18 2120 271 267 267
query19 315 151 146 146
query20 92 86 82 82
query21 200 129 125 125
query22 5031 4909 4800 4800
query23 33952 33329 33045 33045
query24 11099 2833 2860 2833
query25 625 352 355 352
query26 1643 150 146 146
query27 3041 319 328 319
query28 7801 2037 2052 2037
query29 951 615 595 595
query30 293 148 151 148
query31 1001 734 721 721
query32 93 51 55 51
query33 742 245 241 241
query34 1018 481 489 481
query35 801 662 665 662
query36 1068 900 887 887
query37 140 67 65 65
query38 3105 3015 2984 2984
query39 1628 1562 1537 1537
query40 274 126 126 126
query41 41 38 37 37
query42 103 95 98 95
query43 599 533 534 533
query44 1276 738 740 738
query45 268 255 256 255
query46 1072 751 733 733
query47 1956 1880 1906 1880
query48 363 294 301 294
query49 1116 389 406 389
query50 778 397 391 391
query51 6721 6586 6598 6586
query52 106 88 98 88
query53 345 276 271 271
query54 289 231 224 224
query55 76 71 74 71
query56 243 215 222 215
query57 1259 1164 1130 1130
query58 228 190 194 190
query59 3464 3133 3165 3133
query60 251 232 231 231
query61 89 86 90 86
query62 647 439 444 439
query63 303 276 274 274
query64 9562 7225 7219 7219
query65 3107 3047 3031 3031
query66 1379 339 336 336
query67 15675 15032 14949 14949
query68 5136 546 542 542
query69 471 295 301 295
query70 1120 1127 1118 1118
query71 417 268 264 264
query72 7994 2515 2353 2353
query73 703 318 326 318
query74 6462 6127 6042 6042
query75 3323 2646 2612 2612
query76 2849 1053 954 954
query77 419 266 268 266
query78 10851 10161 10327 10161
query79 2874 533 527 527
query80 2006 430 423 423
query81 528 218 222 218
query82 775 96 96 96
query83 299 180 171 171
query84 283 89 90 89
query85 2037 273 263 263
query86 499 290 315 290
query87 3337 3090 3058 3058
query88 4595 2318 2313 2313
query89 492 380 374 374
query90 1975 183 196 183
query91 128 98 97 97
query92 59 50 48 48
query93 4927 520 512 512
query94 1259 191 186 186
query95 398 311 310 310
query96 610 264 261 261
query97 3146 2919 2984 2919
query98 234 220 214 214
query99 1259 846 873 846
Total cold run time: 291646 ms
Total hot run time: 184579 ms
TeamCity be ut coverage result: Function Coverage: 35.69% (8960/25102) Line Coverage: 27.28% (73898/270842) Region Coverage: 26.47% (38176/144211) Branch Coverage: 23.23% (19447/83722) Coverage Report: http://coverage.selectdb-in.cc/coverage/c9bfa966b2fb67d9b8de0a5c39dc53ab57b8793b_c9bfa966b2fb67d9b8de0a5c39dc53ab57b8793b/report/index.html
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.
run buildall
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TeamCity be ut coverage result: Function Coverage: 35.69% (8959/25103) Line Coverage: 27.29% (73908/270843) Region Coverage: 26.47% (38178/144212) Branch Coverage: 23.23% (19448/83722) Coverage Report: http://coverage.selectdb-in.cc/coverage/e492979421e48575145d1f2fefdbd36539300ee5_e492979421e48575145d1f2fefdbd36539300ee5/report/index.html
TPC-DS: Total hot run time: 187132 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e492979421e48575145d1f2fefdbd36539300ee5, data reload: false
query1 919 365 346 346
query2 6289 2367 2343 2343
query3 6656 210 215 210
query4 23230 21709 21876 21709
query5 3900 437 409 409
query6 264 189 179 179
query7 4532 308 308 308
query8 230 187 192 187
query9 8583 2440 2410 2410
query10 412 260 251 251
query11 15239 14812 14811 14811
query12 127 89 91 89
query13 1653 386 374 374
query14 9618 6886 8456 6886
query15 285 184 165 165
query16 8199 265 268 265
query17 1801 572 553 553
query18 2109 284 275 275
query19 333 157 155 155
query20 92 83 84 83
query21 195 135 125 125
query22 5050 4910 4840 4840
query23 33995 33079 33032 33032
query24 10293 2932 2909 2909
query25 574 386 388 386
query26 696 168 155 155
query27 2083 318 311 311
query28 5990 2086 2042 2042
query29 886 684 606 606
query30 225 153 147 147
query31 944 715 717 715
query32 95 51 50 50
query33 635 238 246 238
query34 899 476 473 473
query35 824 669 662 662
query36 1085 898 886 886
query37 100 64 66 64
query38 3304 3007 3027 3007
query39 1566 1543 1544 1543
query40 204 128 128 128
query41 39 37 39 37
query42 104 97 96 96
query43 574 569 548 548
query44 1095 734 737 734
query45 272 255 258 255
query46 1072 692 721 692
query47 1951 1864 1841 1841
query48 403 300 302 300
query49 827 388 390 388
query50 785 391 391 391
query51 6793 6563 6551 6551
query52 105 95 93 93
query53 356 286 283 283
query54 316 238 261 238
query55 83 78 77 77
query56 237 269 218 218
query57 1204 1134 1117 1117
query58 222 202 202 202
query59 3470 3107 3099 3099
query60 244 228 232 228
query61 89 86 85 85
query62 634 439 464 439
query63 309 282 290 282
query64 8330 7332 7228 7228
query65 3113 3061 3078 3061
query66 835 327 346 327
query67 15925 15338 15316 15316
query68 10645 547 548 547
query69 600 305 307 305
query70 1393 1128 1118 1118
query71 528 276 280 276
query72 8816 2533 2385 2385
query73 1600 329 318 318
query74 6612 6121 6130 6121
query75 5495 2700 2676 2676
query76 5895 1008 970 970
query77 702 268 273 268
query78 11096 10352 10236 10236
query79 11445 535 518 518
query80 1878 441 433 433
query81 509 226 219 219
query82 233 87 90 87
query83 209 170 163 163
query84 269 83 85 83
query85 949 278 268 268
query86 347 310 302 302
query87 3291 3122 3106 3106
query88 6308 2363 2326 2326
query89 512 389 386 386
query90 2449 186 180 180
query91 126 97 95 95
query92 60 47 47 47
query93 7656 512 503 503
query94 1563 178 182 178
query95 403 313 300 300
query96 597 270 263 263
query97 3149 2965 2985 2965
query98 235 217 211 211
query99 1106 879 826 826
Total cold run time: 310177 ms
Total hot run time: 187132 ms
PR approved by at least one committer and no changes requested.