feat(redshift) support approx count distinct
Check List
- [ ] Tests has been run in packages where changes made if available
- [ ] Linter has been run for changed code
- [ ] Tests for the changes have been added if not covered yet
- [ ] Docs have been added / updated if required
Add support for redshift approx distinct calls (i think) please feel free to make changes or whatever to help get this through. I would just like to start using this for additive measures in cube and would love this to get in sooner rather than later 😅
Docs: https://docs.aws.amazon.com/redshift/latest/dg/hyperloglog-functions.html
Issue Reference this PR resolves Fixes: https://github.com/cube-js/cube/issues/8510
Description of Changes Made (if issue reference is not provided)
[Description goes here]
The latest updates on your projects. Learn more about Vercel for Git ↗︎
8 Skipped Deployments
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| examples-angular-dashboard | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-d3 | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-dashboard | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-data-table | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-highcharts | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-material-ui | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-react-pivot-table | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm | |
| examples-vue-query-builder | ⬜️ Ignored (Inspect) | Visit Preview | Sep 9, 2024 4:05pm |
If this gets through @paveltiunov's review, I think changes to this docs section would be needed: https://cube.dev/docs/product/configuration/data-sources/aws-redshift#count_distinct_approx
Tested with pre-aggregation:
- Successfully created pre-aggregation SQL
- Executing Load Pre Aggregation SQL: scheduler-ba104a65-512d-4700-aaeb-67caa9743162
--
CREATE TABLE dev_pre_aggregations.session_funnel_main20240101_ubycvf3l_wiz2w3y_1jafait AS SELECT
"session_funnel".step "session_funnel__step", date_trunc('day', ("session_funnel".t::timestamptz AT TIME ZONE 'UTC')) "session_funnel__time_day", count("session_funnel".user_id) "session_funnel__conversions", CASE WHEN HLL("session_funnel".first_step_user_id) > 0 THEN 100.0 * count("session_funnel".user_id) / HLL("session_funnel".first_step_user_id) ELSE NULL END "session_funnel__conversions_percent", HLL("session_funnel".first_step_user_id) "session_funnel__first_step_conversions"
FROM
Validated non pre-aggregated queries,
SELECT
date_trunc('day', ("session_funnel".t::timestamptz AT TIME ZONE 'UTC')) "session_funnel__time_day", CASE WHEN APPROXIMATE COUNT(DISTINCT "session_funnel".first_step_user_id) > 0 THEN 100.0 * count("session_funnel".user_id) / APPROXIMATE COUNT(DISTINCT "session_funnel".first_step_user_id) ELSE NULL END "session_funnel__conversions_percent"
FROM
(
```
@pauldheinrichs Thanks for contributing! I guess hllInit should use https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_CREATE_SKETCH.html
@pauldheinrichs Thanks for contributing! I guess
hllInitshould use https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_CREATE_SKETCH.html
@paveltiunov I attempted that in this https://github.com/cube-js/cube/pull/8512/commits/71e088275864cb58f2a9ed075732386e00701a3e commit actually and resulting not super succesfully when testing? Unless i tested wrong? 🤔
@pauldheinrichs Hard to tell what's went wrong but according to the doc HLL already does an estimation and it isn't something we want to happen in hllInit.
@pauldheinrichs It feels we need HLL_CARDINALITY(HLL_COMBINE(${sql})) for hllMerge and also HLL_COMBINE(${sql}) for hllMergeOnly. But this one needs to be tested.
@pauldheinrichs Given that AWS HLL format spec is proprietary and HLL support can't be also brought to Cube Store, I wonder how useful this addition to the Redshift driver is. WDYT?