[Fix](multi-catalog) Fix string dictionary filtering when using null related functions in parquet and orc reader by disabling dictionary filtering when predicates contain functions.
Proposed changes
Issue
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
Root cause:
The current implementation of dictionary filtering does not take into account the implementation of NULL values because the dictionary itself does not contain NULL value encoding. As a result, many NULL-related functions or expressions cannot work properly, such as is null, is not null, coalesce, etc.
Solution
Here we first disable dictionary filtering when predicate contains functions. Implementation of NULL value dictionary filtering will be carried out later.
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
clang-tidy review says "All clean, LGTM! :+1:"
TPC-H: Total hot run time: 39799 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5efc1d5f7942574acdcd03c4d6c972e0d2801dac, data reload: false
------ Round 1 ----------------------------------
q1 17896 4485 4279 4279
q2 2700 212 189 189
q3 11609 1171 1156 1156
q4 10603 771 882 771
q5 7616 2729 2740 2729
q6 220 132 136 132
q7 945 608 603 603
q8 9564 2058 2044 2044
q9 8902 6452 6429 6429
q10 8925 3684 3682 3682
q11 449 248 242 242
q12 427 219 214 214
q13 18119 2982 2965 2965
q14 256 218 232 218
q15 503 468 474 468
q16 518 381 377 377
q17 951 626 750 626
q18 8077 7452 7475 7452
q19 4060 1549 1440 1440
q20 641 295 312 295
q21 4955 3209 3809 3209
q22 328 279 279 279
Total cold run time: 118264 ms
Total hot run time: 39799 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4334 4207 4189 4189
q2 364 264 274 264
q3 3004 2794 2735 2735
q4 1850 1605 1596 1596
q5 5221 5252 5262 5252
q6 212 124 127 124
q7 2072 1766 1748 1748
q8 3148 3280 3256 3256
q9 8309 8330 8316 8316
q10 3855 3657 3641 3641
q11 582 474 482 474
q12 758 560 575 560
q13 17447 2982 3003 2982
q14 295 265 264 264
q15 519 471 465 465
q16 469 424 411 411
q17 1754 1488 1468 1468
q18 7616 7588 7464 7464
q19 2799 1559 1545 1545
q20 1969 1786 1771 1771
q21 4843 4652 4697 4652
q22 565 474 499 474
Total cold run time: 71985 ms
Total hot run time: 53651 ms
TeamCity be ut coverage result: Function Coverage: 35.66% (9019/25295) Line Coverage: 27.32% (74583/273043) Region Coverage: 26.54% (38601/145432) Branch Coverage: 23.40% (19690/84134) Coverage Report: http://coverage.selectdb-in.cc/coverage/5efc1d5f7942574acdcd03c4d6c972e0d2801dac_5efc1d5f7942574acdcd03c4d6c972e0d2801dac/report/index.html
TPC-DS: Total hot run time: 169494 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5efc1d5f7942574acdcd03c4d6c972e0d2801dac, data reload: false
query1 911 379 378 378
query2 6452 2292 2373 2292
query3 6653 206 221 206
query4 19179 17217 17134 17134
query5 4162 411 411 411
query6 256 157 158 157
query7 4582 311 291 291
query8 242 188 184 184
query9 8615 2369 2360 2360
query10 452 271 277 271
query11 10511 10028 9922 9922
query12 138 86 88 86
query13 1650 354 351 351
query14 10208 7486 7611 7486
query15 212 172 168 168
query16 7882 267 259 259
query17 1838 535 519 519
query18 1973 272 281 272
query19 208 169 184 169
query20 89 86 81 81
query21 195 135 128 128
query22 4072 3900 3857 3857
query23 33523 33259 33125 33125
query24 12024 2815 2872 2815
query25 694 344 355 344
query26 1791 159 156 156
query27 2936 315 328 315
query28 7386 2009 2007 2007
query29 1121 597 593 593
query30 311 171 173 171
query31 953 760 750 750
query32 97 52 52 52
query33 771 265 286 265
query34 1016 459 483 459
query35 715 579 582 579
query36 1053 919 888 888
query37 272 69 75 69
query38 2904 2792 2779 2779
query39 857 790 775 775
query40 279 122 125 122
query41 46 45 43 43
query42 102 94 97 94
query43 567 538 542 538
query44 1201 714 719 714
query45 182 166 165 165
query46 1072 734 725 725
query47 1846 1749 1777 1749
query48 364 290 286 286
query49 1187 396 383 383
query50 770 377 415 377
query51 6862 6737 6774 6737
query52 98 93 92 92
query53 349 281 286 281
query54 989 421 423 421
query55 72 72 72 72
query56 267 239 234 234
query57 1145 1047 1060 1047
query58 241 215 227 215
query59 3199 3189 3164 3164
query60 278 260 257 257
query61 96 91 111 91
query62 631 445 471 445
query63 308 282 280 280
query64 9772 2231 1742 1742
query65 3168 3120 3126 3120
query66 1382 349 321 321
query67 15474 15308 14723 14723
query68 4545 541 564 541
query69 438 275 258 258
query70 1127 1073 1097 1073
query71 406 268 269 268
query72 7567 5349 2732 2732
query73 713 323 319 319
query74 5998 5692 5633 5633
query75 3405 2626 2608 2608
query76 2856 967 920 920
query77 438 269 264 264
query78 10184 9788 9965 9788
query79 2415 510 516 510
query80 998 436 425 425
query81 523 244 246 244
query82 662 91 95 91
query83 236 170 170 170
query84 240 90 85 85
query85 1639 365 267 267
query86 484 276 294 276
query87 3321 3183 3151 3151
query88 4084 2326 2339 2326
query89 480 412 388 388
query90 2002 189 191 189
query91 127 100 95 95
query92 61 46 48 46
query93 1565 507 485 485
query94 1208 182 184 182
query95 407 303 310 303
query96 571 259 266 259
query97 3222 2973 3068 2973
query98 240 213 216 213
query99 1103 850 870 850
Total cold run time: 274116 ms
Total hot run time: 169494 ms
ClickBench: Total hot run time: 30.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5efc1d5f7942574acdcd03c4d6c972e0d2801dac, data reload: false
query1 0.04 0.04 0.04
query2 0.08 0.05 0.04
query3 0.23 0.05 0.05
query4 1.69 0.07 0.06
query5 0.48 0.47 0.51
query6 1.11 0.73 0.72
query7 0.01 0.01 0.02
query8 0.05 0.04 0.04
query9 0.52 0.49 0.50
query10 0.54 0.53 0.55
query11 0.14 0.10 0.11
query12 0.15 0.12 0.11
query13 0.61 0.59 0.59
query14 0.77 0.79 0.77
query15 0.82 0.80 0.80
query16 0.34 0.37 0.37
query17 1.02 1.01 1.01
query18 0.22 0.25 0.25
query19 1.74 1.64 1.70
query20 0.02 0.01 0.01
query21 15.71 0.67 0.65
query22 4.46 7.11 2.11
query23 18.30 1.34 1.19
query24 1.62 0.37 0.19
query25 0.13 0.08 0.08
query26 0.25 0.17 0.17
query27 0.08 0.07 0.07
query28 13.27 1.01 1.07
query29 12.76 3.31 3.30
query30 0.24 0.07 0.05
query31 2.87 0.37 0.37
query32 3.31 0.46 0.47
query33 2.84 2.91 2.86
query34 16.99 4.43 4.42
query35 4.48 4.51 4.49
query36 0.64 0.48 0.46
query37 0.18 0.15 0.15
query38 0.15 0.15 0.14
query39 0.04 0.04 0.03
query40 0.17 0.13 0.14
query41 0.09 0.05 0.06
query42 0.06 0.05 0.05
query43 0.04 0.04 0.04
Total cold run time: 109.26 s
Total hot run time: 30.57 s
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.