cube icon indicating copy to clipboard operation
cube copied to clipboard

Cannot use use_original_sql_pre_aggregations on Postgres

Open TobiasMaehl-pIX opened this issue 2 years ago • 4 comments

Describe the bug We would like to reuse an original_sql pre-aggregation stored in Postgres in a rollup pre-aggregation. Setting use_original_sql_pre_aggregations: true fails with the following error message during scheduled refresh:

"... FROM pre_agg_tenant_123.base_positions_main AS "main__base_positions"..." {"message":"Refresh Scheduler Error","error":"Error: relation "pre_agg_tenant_123.base_positions_main" does not exist",

When I introspect the database, I can see that Cube appends certain hashes to the original_sql table name. Those hashes are missing in the sql query to the database, which causes it to fail.

Expected behavior It should be possible to reuse the original_sql pre-aggregation stored in Postgres, as stated in the docs. We have a heavy sql query in the original data model.

Minimally reproducible Cube Schema

cubes:
  - name: base_positions
    public: false
    sql: SELECT "id_1" AS position_id, "test" AS position_name
measures:
  - name: position_count
    type: count
dimensions:
  - name: position_id
    sql: position_id
    type: string
    primary_key: true
  - name: position_name
    sql: position_name
    type: string
  - name: position_date
    sql: "date"
    type: time

pre_aggregations:
  - name: main
    type: original_sql
    external: false
  - name: positions_per_month
    measures:
      - CUBE.position_count
    time_dimension: CUBE.position_date
    granularity: month
    use_original_sql_pre_aggregations: true

Version: 0.34.19

TobiasMaehl-pIX avatar Nov 15 '23 15:11 TobiasMaehl-pIX

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

github-actions[bot] avatar Nov 30 '23 13:11 github-actions[bot]

This also affects me. I have a number of expensive underlying queries that it would be great to accelerate with originalsql preaggs. It would be particularly great if the tables could be aliased with a view with a consistent name, as then they could also be queried outside of cube.

marcintustin avatar Sep 27 '24 13:09 marcintustin

I was able to reproduce this with Cube v0.36.2 (latest), same behavior as reported:

Screenshot 2024-10-08 at 23 34 36

igorlukanin avatar Oct 08 '24 21:10 igorlukanin

I have an extended write-up on my work around for this here: https://gist.github.com/marcintustin/08dab56a073696977b9b1ca6654ce9db

Here's the main text, which refers to code in the gist:

Specifically the observed behaviour of cube before this hack is deployed is that cube creates internal preaggregation tables with names like bb_orig_remvxdcr_f4wglftw_1jgg62u but appears not to create the corresponding undecorated names like bb_orig. Cube wants to read from bb_orig when using that preaggregation (whether to build a rollup preaggregation or to answer a query).

This hack creates views with names like bb_orig. When that happens cube happily reads from the views and originalsql preaggregations start to mostly work. Unfortunately, once the views are in place it becomes apparent that cube wants to drop them but can't (because it's issuing a drop table) and logs an error. Everything else appears to work, though. I haven't attempted to untangle whether cube is creating the tables named like bb_orig and immediately dropping them, or just not doing it.

In order to determine which of the multiple tables with the same stem should be used to make the view (e.g. which of the tables starting bb_orig), this requires that the originalsql preagg have a column aggupdatedts. If this were folded into cube, presumably it could keep track of which was the latest table instead of requiring the user to explicitly create the column.

One feature that would have made the cube nicer to write would be a way to specify the originalsql table in the refreshKey.sql; or a feature to have the default refresh for a preagg that uses original sql to be based on the update of the originalsql table completing.

The rest of this gist consists of a python file, its accompanying requirements.txt, and an example cube file.

marcintustin avatar Oct 17 '24 19:10 marcintustin

We are facing same issue with Athena as well.

kk-r avatar Feb 19 '25 10:02 kk-r