Clickhouse string ordering and string filtering by UTF8 instead of bytes
Clickhouse defaults to using bytes to order by and string manipulation functions such as lower, upper uses ascii. To overcome this limitation they have COLLATE keyword, and lowerUTF8, upperUTF8 functions.
Check List
- [X] Tests has been run in packages where changes made if available
- [X] Linter has been run for changed code
- [X] Tests for the changes have been added if not covered yet
- [X] Docs have been added / updated if required
Description of Changes Made (if issue reference is not provided)
- Replaced
CONCATSQL function with js template literal to prevent unnecessary DB function call - Used
lowerUTF8instead oflowerto support utf8 compatible search - Added
COLLATE āenāwhen ordering by strings to order incasesensitive. Clickhouse orders by bytes on default.
Hey @casab ! Thanks for contributing! Could you please add an integration test for it?
The latest updates on your projects. Learn more about Vercel for Git āļø
8 Skipped Deployments
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| examples-angular-dashboard | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-d3 | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-dashboard | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-data-table | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-highcharts | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-material-ui | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-react-pivot-table | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm | |
| examples-vue-query-builder | ā¬ļø Ignored (Inspect) | Visit Preview | Nov 22, 2024 1:40pm |
@casab is attempting to deploy a commit to the Cube Dev Team on Vercel.
A member of the Team first needs to authorize it.
@casab May I kindly ask you to test with the latest release and rebase your changes on top of it? We have migrated to a new ClickHouse client library recently. Thanks in advance!
@igorlukanin Can you approve running the workflows?
Hi @casab Could you please sync with theĀ latest master,Ā resolve conflicts andĀ fix warnings/errors? Ping me whenever you need toĀ approve running workflows!
Btw, forĀ aĀ quick linter fixes you can run yarn run lint:fix
@KSDaemon of course, done it.
@casab Hey! Thnx forĀ theĀ updates! Unfortunately, some lint errors still exists:
/home/runner/work/cube/cube/packages/cubejs-schema-compiler/src/adapter/ClickHouseQuery.ts
178:1 error Expected indentation of 6 spaces but found 8 indent
178:16 warning Strings must use singlequote quotes
182:1 error Expected indentation of 6 spaces but found 8 indent
182:26 error Unexpected string concatenation prefer-template
182:58 warning Strings must use singlequote quotes
183:1 error Expected indentation of 6 spaces but found 8 indent
184:1 error Expected indentation of 6 spaces but found 8 indent
184:16 warning Strings must use singlequote quotes
188:1 error Expected indentation of 6 spaces but found 8 indent
188:16 warning Strings must use singlequote quotes
192:1 error Expected indentation of 2 spaces but found 0 indent
Maybe thats because ofĀ merge... But anyway...
Can you fix them, please?
@KSDaemon Yeah sorry about that. I have fixed the linter issues now. It's just that this PR is haunting me for 2 years...
Ah.. Sorry toĀ hear that! But... Don't worry! We'll nail it down!
Let me check these integration test results and I'll make the necessary changes.
@KSDaemon I think it's alright now. Apparently I wasn't checking if the order field's type is string or not. I also modified the other integration tests for the new test case I have added.
@casab Yeah, great addition, thatĀ was one ofĀ my questions, I want toĀ ask! :)
What about theĀ collation: what if theĀ underlying values are inĀ a different language other than English? Current code hardcode theĀ collation 'en'.
@KSDaemon I thought about it. When I created this PR, we were having problems with Turkish ordering in the company I used to work. But the biggest problem is not even the Turkish characters. Ordering is not case insensitive. 'en' would solve huge portion of that. I thought about adding a parameter to the query. Or maybe fetch it from cube definition.
What do you suggest? Where can we get the selected collation from?
Codecov Report
Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
Project coverage is 80.73%. Comparing base (
1203291) to head (79373d6). Report is 343 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| ...t/cubesql/cubesql/src/compile/engine/df/wrapper.rs | 96.15% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #6143 +/- ##
==========================================
- Coverage 83.57% 80.73% -2.84%
==========================================
Files 227 227
Lines 81618 81646 +28
==========================================
- Hits 68210 65916 -2294
- Misses 13408 15730 +2322
| Flag | Coverage Ī | |
|---|---|---|
| cubesql | 80.73% <96.55%> (-2.84%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@KSDaemon @mcheshkov Hey, I have fixed all the errors. And implemented all the required parts. I would appreciate it if you can review it. Especially the rust part.
@casab We're actively looking into merging this PR. Could you please resolve the conflicts and resolve on top of master once again? Thanks in advance!
@igorlukanin sure, I'll look into it and let you know.