(Fix)[hive-writer] Fixed the issue when partition values contain spaces when writing to s3.
Proposed changes
Issue Number: close #31442
(Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3.
Error msg
org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc
at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2-
Root Cause
Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI.
Solution
The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.
Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR
Since 2024-03-18, the Document has been moved to doris-website. See Doris Document.
run buildall
run buildall
run buildall
TPC-H: Total hot run time: 41410 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false
------ Round 1 ----------------------------------
q1 17619 4370 4241 4241
q2 2031 189 196 189
q3 10499 1338 1210 1210
q4 10207 862 848 848
q5 7540 2728 2747 2728
q6 231 132 137 132
q7 970 665 639 639
q8 9224 2179 2128 2128
q9 9827 6733 6753 6733
q10 9545 3930 3888 3888
q11 462 248 266 248
q12 448 240 234 234
q13 17480 3201 3338 3201
q14 256 208 220 208
q15 511 465 472 465
q16 490 408 425 408
q17 1016 680 734 680
q18 8481 7925 7697 7697
q19 5461 1622 1645 1622
q20 648 339 323 323
q21 5227 3253 4195 3253
q22 389 335 335 335
Total cold run time: 118562 ms
Total hot run time: 41410 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4505 4425 4426 4425
q2 374 278 277 277
q3 3187 2964 3009 2964
q4 1903 1610 1628 1610
q5 5452 5551 5520 5520
q6 213 126 130 126
q7 2225 1870 1800 1800
q8 3274 3430 3433 3430
q9 8637 8807 8714 8714
q10 4074 3758 3827 3758
q11 594 485 515 485
q12 788 641 628 628
q13 15885 3172 3144 3144
q14 311 277 278 277
q15 517 500 481 481
q16 498 451 445 445
q17 1818 1529 1510 1510
q18 7813 7537 7345 7345
q19 4619 1616 1588 1588
q20 2039 1775 1797 1775
q21 13811 4890 4854 4854
q22 577 541 531 531
Total cold run time: 83114 ms
Total hot run time: 55687 ms
TPC-DS: Total hot run time: 171580 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false
query1 916 404 366 366
query2 6454 2471 2361 2361
query3 6644 217 213 213
query4 19243 17275 17288 17275
query5 4138 427 431 427
query6 250 163 161 161
query7 4592 310 291 291
query8 324 290 285 285
query9 8479 2403 2399 2399
query10 461 288 276 276
query11 10452 10154 10095 10095
query12 140 92 90 90
query13 1630 369 365 365
query14 8522 7749 6116 6116
query15 234 192 192 192
query16 7631 268 262 262
query17 1304 525 520 520
query18 1956 273 271 271
query19 201 158 155 155
query20 93 93 87 87
query21 218 133 130 130
query22 4215 3937 3802 3802
query23 33668 33100 33181 33100
query24 6997 2819 2944 2819
query25 534 350 388 350
query26 706 157 155 155
query27 1928 345 330 330
query28 3808 2099 2124 2099
query29 862 611 596 596
query30 247 151 152 151
query31 945 742 771 742
query32 93 56 54 54
query33 506 285 265 265
query34 845 479 505 479
query35 738 624 638 624
query36 1065 956 917 917
query37 111 65 69 65
query38 2923 2825 2766 2766
query39 880 798 792 792
query40 195 128 126 126
query41 58 51 51 51
query42 106 95 98 95
query43 615 579 555 555
query44 1103 742 752 742
query45 190 181 171 171
query46 1056 733 715 715
query47 1840 1756 1760 1756
query48 360 300 304 300
query49 773 395 389 389
query50 768 395 396 395
query51 6808 6821 6688 6688
query52 101 94 95 94
query53 356 289 293 289
query54 542 436 437 436
query55 76 74 77 74
query56 277 285 246 246
query57 1118 1079 1021 1021
query58 225 213 215 213
query59 3439 3367 3322 3322
query60 282 264 263 263
query61 94 87 91 87
query62 556 446 442 442
query63 316 290 289 289
query64 8462 2261 1689 1689
query65 3206 3099 3125 3099
query66 809 325 339 325
query67 15230 14643 14659 14643
query68 4570 546 532 532
query69 443 275 268 268
query70 1177 1125 1156 1125
query71 401 284 270 270
query72 7645 5822 5338 5338
query73 748 335 328 328
query74 6005 5590 5671 5590
query75 3307 2618 2637 2618
query76 2231 926 965 926
query77 389 273 274 273
query78 11774 10278 9708 9708
query79 2341 520 529 520
query80 1496 446 431 431
query81 527 226 215 215
query82 613 92 91 91
query83 291 175 176 175
query84 267 88 90 88
query85 1010 286 282 282
query86 485 320 309 309
query87 3307 3143 3111 3111
query88 3969 2363 2384 2363
query89 471 408 401 401
query90 2170 193 193 193
query91 200 97 104 97
query92 62 51 53 51
query93 2013 524 510 510
query94 1142 193 186 186
query95 407 317 313 313
query96 595 272 278 272
query97 3154 2995 2987 2987
query98 253 222 218 218
query99 1137 866 833 833
Total cold run time: 258514 ms
Total hot run time: 171580 ms
ClickBench: Total hot run time: 30.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 76d85185546dfbbe4ccaabeb4535a87137b9f363, data reload: false
query1 0.04 0.03 0.03
query2 0.09 0.04 0.04
query3 0.23 0.05 0.05
query4 1.68 0.07 0.07
query5 0.53 0.49 0.49
query6 1.14 0.73 0.72
query7 0.02 0.02 0.01
query8 0.05 0.04 0.04
query9 0.54 0.49 0.48
query10 0.55 0.54 0.54
query11 0.15 0.12 0.12
query12 0.15 0.12 0.11
query13 0.60 0.59 0.60
query14 0.79 0.76 0.78
query15 0.82 0.81 0.82
query16 0.36 0.38 0.35
query17 0.96 0.94 0.97
query18 0.23 0.23 0.26
query19 1.75 1.69 1.71
query20 0.02 0.01 0.01
query21 15.43 0.74 0.69
query22 4.29 6.86 2.31
query23 18.28 1.34 1.26
query24 1.88 0.28 0.22
query25 0.16 0.09 0.08
query26 0.27 0.17 0.17
query27 0.08 0.08 0.09
query28 13.28 1.00 0.99
query29 13.31 3.28 3.27
query30 0.24 0.05 0.05
query31 2.88 0.38 0.40
query32 3.29 0.46 0.47
query33 2.90 2.86 2.88
query34 16.98 4.41 4.43
query35 4.49 4.45 4.62
query36 0.67 0.46 0.48
query37 0.17 0.16 0.16
query38 0.15 0.15 0.15
query39 0.05 0.04 0.04
query40 0.17 0.14 0.14
query41 0.09 0.04 0.04
query42 0.05 0.05 0.04
query43 0.04 0.04 0.04
Total cold run time: 109.85 s
Total hot run time: 30.83 s
run buildall
TPC-H: Total hot run time: 41160 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false
------ Round 1 ----------------------------------
q1 17605 4331 4282 4282
q2 2023 196 202 196
q3 10453 1213 991 991
q4 10193 821 872 821
q5 7445 2752 2721 2721
q6 224 140 141 140
q7 954 633 617 617
q8 9214 2113 2111 2111
q9 9197 6684 6610 6610
q10 9195 3978 3837 3837
q11 462 263 269 263
q12 448 230 232 230
q13 17339 3245 3230 3230
q14 279 248 238 238
q15 524 482 486 482
q16 522 405 402 402
q17 1001 685 675 675
q18 8271 7805 7853 7805
q19 4443 1595 1528 1528
q20 645 323 327 323
q21 5082 3307 4086 3307
q22 409 369 351 351
Total cold run time: 115928 ms
Total hot run time: 41160 ms
----- Round 2, with runtime_filter_mode=off -----
q1 4514 4418 4467 4418
q2 405 278 277 277
q3 3142 2924 2882 2882
q4 2001 1746 1596 1596
q5 5292 5511 5507 5507
q6 214 123 128 123
q7 2216 1813 1811 1811
q8 3234 3420 3375 3375
q9 8614 8594 8687 8594
q10 4123 3781 3740 3740
q11 595 509 526 509
q12 811 648 636 636
q13 16397 3117 3168 3117
q14 303 271 279 271
q15 537 480 484 480
q16 511 435 440 435
q17 1801 1535 1510 1510
q18 7713 7546 7389 7389
q19 1702 1532 1642 1532
q20 2003 1780 1763 1763
q21 10676 4776 4675 4675
q22 630 519 542 519
Total cold run time: 77434 ms
Total hot run time: 55159 ms
TPC-DS: Total hot run time: 169370 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false
query1 941 387 366 366
query2 6444 2344 2418 2344
query3 6630 206 211 206
query4 19127 17297 17043 17043
query5 4109 430 422 422
query6 280 156 153 153
query7 4586 302 309 302
query8 316 292 285 285
query9 8497 2403 2391 2391
query10 442 282 267 267
query11 10582 10172 10098 10098
query12 132 89 88 88
query13 1646 363 365 363
query14 10100 7367 6845 6845
query15 231 188 194 188
query16 7879 266 268 266
query17 1748 530 507 507
query18 1993 286 275 275
query19 202 151 154 151
query20 91 87 85 85
query21 202 131 128 128
query22 4290 3888 3876 3876
query23 33559 32939 33211 32939
query24 6688 2902 2816 2816
query25 543 364 362 362
query26 703 157 158 157
query27 2006 329 321 321
query28 3642 2070 2063 2063
query29 890 604 597 597
query30 226 152 152 152
query31 957 754 748 748
query32 94 52 54 52
query33 506 277 266 266
query34 852 475 485 475
query35 700 592 589 589
query36 1054 930 925 925
query37 99 70 66 66
query38 2893 2764 2763 2763
query39 854 779 809 779
query40 190 125 122 122
query41 53 50 56 50
query42 100 98 93 93
query43 589 551 556 551
query44 1079 724 755 724
query45 192 171 165 165
query46 1059 718 729 718
query47 1837 1741 1796 1741
query48 368 297 299 297
query49 846 393 386 386
query50 769 390 383 383
query51 6839 6601 6749 6601
query52 105 88 90 88
query53 351 286 286 286
query54 554 440 438 438
query55 73 70 73 70
query56 260 242 245 242
query57 1097 1051 1065 1051
query58 229 217 208 208
query59 3281 3257 3250 3250
query60 289 254 261 254
query61 92 93 90 90
query62 555 441 455 441
query63 312 292 292 292
query64 8482 2228 1746 1746
query65 3146 3119 3105 3105
query66 778 338 334 334
query67 15225 14808 14703 14703
query68 4578 543 543 543
query69 499 267 268 267
query70 1127 1146 1067 1067
query71 420 270 271 270
query72 7517 2863 2687 2687
query73 732 337 327 327
query74 6057 5639 5596 5596
query75 3538 2671 2667 2667
query76 2878 1064 1128 1064
query77 598 270 272 270
query78 10215 9736 9743 9736
query79 2160 518 513 513
query80 1000 458 453 453
query81 531 223 218 218
query82 1282 93 94 93
query83 215 179 180 179
query84 246 89 95 89
query85 1210 343 318 318
query86 462 317 292 292
query87 3278 3092 3113 3092
query88 3981 2396 2382 2382
query89 482 397 403 397
query90 2066 192 196 192
query91 136 107 111 107
query92 61 51 51 51
query93 2401 512 506 506
query94 1197 204 198 198
query95 419 322 322 322
query96 595 273 272 272
query97 3214 3031 2993 2993
query98 246 230 219 219
query99 1135 848 842 842
Total cold run time: 259960 ms
Total hot run time: 169370 ms
ClickBench: Total hot run time: 30.71 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c886dcbb260656b1bf1ef3ff8895a76a2ab941df, data reload: false
query1 0.04 0.03 0.03
query2 0.09 0.04 0.04
query3 0.23 0.05 0.05
query4 1.68 0.09 0.08
query5 0.51 0.47 0.48
query6 1.12 0.72 0.72
query7 0.02 0.02 0.02
query8 0.06 0.04 0.04
query9 0.55 0.49 0.49
query10 0.54 0.57 0.55
query11 0.16 0.12 0.12
query12 0.14 0.12 0.12
query13 0.59 0.58 0.59
query14 0.77 0.80 0.77
query15 0.82 0.83 0.82
query16 0.36 0.38 0.36
query17 0.96 1.03 1.03
query18 0.23 0.25 0.23
query19 1.76 1.77 1.72
query20 0.02 0.01 0.01
query21 15.77 0.65 0.65
query22 4.45 6.83 2.02
query23 18.25 1.38 1.24
query24 1.99 0.22 0.21
query25 0.15 0.09 0.09
query26 0.26 0.17 0.17
query27 0.08 0.08 0.07
query28 13.34 1.01 0.99
query29 13.14 3.32 3.28
query30 0.24 0.05 0.05
query31 2.87 0.38 0.39
query32 3.31 0.46 0.46
query33 2.93 2.90 2.90
query34 17.21 4.39 4.42
query35 4.47 4.50 4.51
query36 0.68 0.52 0.50
query37 0.18 0.15 0.15
query38 0.15 0.14 0.15
query39 0.04 0.04 0.03
query40 0.16 0.13 0.14
query41 0.08 0.04 0.05
query42 0.05 0.04 0.04
query43 0.04 0.04 0.03
Total cold run time: 110.49 s
Total hot run time: 30.71 s
PR approved by at least one committer and no changes requested.
PR approved by anyone and no changes requested.