cube icon indicating copy to clipboard operation
cube copied to clipboard

Clickhouse string ordering and string filtering by UTF8 instead of bytes

Open casab opened this issue 3 years ago • 18 comments

Clickhouse defaults to using bytes to order by and string manipulation functions such as lower, upper uses ascii. To overcome this limitation they have COLLATE keyword, and lowerUTF8, upperUTF8 functions.

Check List

  • [X] Tests has been run in packages where changes made if available
  • [X] Linter has been run for changed code
  • [X] Tests for the changes have been added if not covered yet
  • [X] Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

  • Replaced CONCAT SQL function with js template literal to prevent unnecessary DB function call
  • Used lowerUTF8 instead of lower to support utf8 compatible search
  • Added COLLATE ā€˜en’ when ordering by strings to order incasesensitive. Clickhouse orders by bytes on default.

casab avatar Feb 09 '23 14:02 casab

Hey @casab ! Thanks for contributing! Could you please add an integration test for it?

paveltiunov avatar Feb 09 '23 17:02 paveltiunov

The latest updates on your projects. Learn more about Vercel for Git ā†—ļøŽ

8 Skipped Deployments
Name Status Preview Comments Updated (UTC)
examples-angular-dashboard ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-d3 ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-dashboard ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-data-table ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-highcharts ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-material-ui ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-react-pivot-table ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm
examples-vue-query-builder ā¬œļø Ignored (Inspect) Visit Preview Nov 22, 2024 1:40pm

vercel[bot] avatar Jun 07 '24 14:06 vercel[bot]

@casab is attempting to deploy a commit to the Cube Dev Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Jun 07 '24 14:06 vercel[bot]

@casab May I kindly ask you to test with the latest release and rebase your changes on top of it? We have migrated to a new ClickHouse client library recently. Thanks in advance!

igorlukanin avatar Nov 22 '24 13:11 igorlukanin

@igorlukanin Can you approve running the workflows?

casab avatar Nov 22 '24 13:11 casab

Hi @casab Could you please sync with theĀ latest master,Ā resolve conflicts andĀ fix warnings/errors? Ping me whenever you need toĀ approve running workflows! Btw, forĀ aĀ quick linter fixes you can run yarn run lint:fix

KSDaemon avatar Jan 17 '25 10:01 KSDaemon

@KSDaemon of course, done it.

casab avatar Jan 24 '25 14:01 casab

@casab Hey! Thnx forĀ theĀ updates! Unfortunately, some lint errors still exists:

  /home/runner/work/cube/cube/packages/cubejs-schema-compiler/src/adapter/ClickHouseQuery.ts
    178:1   error    Expected indentation of 6 spaces but found 8  indent
    178:16  warning  Strings must use singlequote                  quotes
    182:1   error    Expected indentation of 6 spaces but found 8  indent
    182:26  error    Unexpected string concatenation               prefer-template
    182:58  warning  Strings must use singlequote                  quotes
    183:1   error    Expected indentation of 6 spaces but found 8  indent
    184:1   error    Expected indentation of 6 spaces but found 8  indent
    184:16  warning  Strings must use singlequote                  quotes
    188:1   error    Expected indentation of 6 spaces but found 8  indent
    188:16  warning  Strings must use singlequote                  quotes
    192:1   error    Expected indentation of 2 spaces but found 0  indent

Maybe thats because ofĀ merge... But anyway...

Can you fix them, please?

KSDaemon avatar Jan 24 '25 17:01 KSDaemon

@KSDaemon Yeah sorry about that. I have fixed the linter issues now. It's just that this PR is haunting me for 2 years...

casab avatar Jan 24 '25 18:01 casab

Ah.. Sorry toĀ hear that! But... Don't worry! We'll nail it down!

KSDaemon avatar Jan 24 '25 18:01 KSDaemon

Let me check these integration test results and I'll make the necessary changes.

casab avatar Jan 24 '25 19:01 casab

@KSDaemon I think it's alright now. Apparently I wasn't checking if the order field's type is string or not. I also modified the other integration tests for the new test case I have added.

casab avatar Jan 24 '25 20:01 casab

@casab Yeah, great addition, thatĀ was one ofĀ my questions, I want toĀ ask! :) What about theĀ collation: what if theĀ underlying values are inĀ a different language other than English? Current code hardcode theĀ collation 'en'.

KSDaemon avatar Jan 24 '25 20:01 KSDaemon

@KSDaemon I thought about it. When I created this PR, we were having problems with Turkish ordering in the company I used to work. But the biggest problem is not even the Turkish characters. Ordering is not case insensitive. 'en' would solve huge portion of that. I thought about adding a parameter to the query. Or maybe fetch it from cube definition.

What do you suggest? Where can we get the selected collation from?

casab avatar Jan 24 '25 20:01 casab

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.73%. Comparing base (1203291) to head (79373d6). Report is 343 commits behind head on master.

Files with missing lines Patch % Lines
...t/cubesql/cubesql/src/compile/engine/df/wrapper.rs 96.15% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6143      +/-   ##
==========================================
- Coverage   83.57%   80.73%   -2.84%     
==========================================
  Files         227      227              
  Lines       81618    81646      +28     
==========================================
- Hits        68210    65916    -2294     
- Misses      13408    15730    +2322     
Flag Coverage Ī”
cubesql 80.73% <96.55%> (-2.84%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Feb 06 '25 11:02 codecov[bot]

@KSDaemon @mcheshkov Hey, I have fixed all the errors. And implemented all the required parts. I would appreciate it if you can review it. Especially the rust part.

casab avatar Feb 11 '25 18:02 casab

@casab We're actively looking into merging this PR. Could you please resolve the conflicts and resolve on top of master once again? Thanks in advance!

igorlukanin avatar Jun 09 '25 10:06 igorlukanin

@igorlukanin sure, I'll look into it and let you know.

casab avatar Jun 09 '25 11:06 casab