Cannot use use_original_sql_pre_aggregations on Postgres
Describe the bug
We would like to reuse an original_sql pre-aggregation stored in Postgres in a rollup pre-aggregation. Setting use_original_sql_pre_aggregations: true fails with the following error message during scheduled refresh:
"... FROM pre_agg_tenant_123.base_positions_main AS "main__base_positions"..." {"message":"Refresh Scheduler Error","error":"Error: relation "pre_agg_tenant_123.base_positions_main" does not exist",
When I introspect the database, I can see that Cube appends certain hashes to the original_sql table name. Those hashes are missing in the sql query to the database, which causes it to fail.
Expected behavior It should be possible to reuse the original_sql pre-aggregation stored in Postgres, as stated in the docs. We have a heavy sql query in the original data model.
Minimally reproducible Cube Schema
cubes:
- name: base_positions
public: false
sql: SELECT "id_1" AS position_id, "test" AS position_name
measures:
- name: position_count
type: count
dimensions:
- name: position_id
sql: position_id
type: string
primary_key: true
- name: position_name
sql: position_name
type: string
- name: position_date
sql: "date"
type: time
pre_aggregations:
- name: main
type: original_sql
external: false
- name: positions_per_month
measures:
- CUBE.position_count
time_dimension: CUBE.position_date
granularity: month
use_original_sql_pre_aggregations: true
Version: 0.34.19
If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.
This also affects me. I have a number of expensive underlying queries that it would be great to accelerate with originalsql preaggs. It would be particularly great if the tables could be aliased with a view with a consistent name, as then they could also be queried outside of cube.
I was able to reproduce this with Cube v0.36.2 (latest), same behavior as reported:
I have an extended write-up on my work around for this here: https://gist.github.com/marcintustin/08dab56a073696977b9b1ca6654ce9db
Here's the main text, which refers to code in the gist:
Specifically the observed behaviour of cube before this hack is deployed is that cube creates internal preaggregation tables with names like bb_orig_remvxdcr_f4wglftw_1jgg62u but appears not to create the corresponding undecorated names like bb_orig. Cube wants to read from bb_orig when using that preaggregation (whether to build a rollup preaggregation or to answer a query).
This hack creates views with names like bb_orig. When that happens cube happily reads from the views and originalsql preaggregations start to mostly work. Unfortunately, once the views are in place it becomes apparent that cube wants to drop them but can't (because it's issuing a drop table) and logs an error. Everything else appears to work, though. I haven't attempted to untangle whether cube is creating the tables named like bb_orig and immediately dropping them, or just not doing it.
In order to determine which of the multiple tables with the same stem should be used to make the view (e.g. which of the tables starting bb_orig), this requires that the originalsql preagg have a column aggupdatedts. If this were folded into cube, presumably it could keep track of which was the latest table instead of requiring the user to explicitly create the column.
One feature that would have made the cube nicer to write would be a way to specify the originalsql table in the refreshKey.sql; or a feature to have the default refresh for a preagg that uses original sql to be based on the update of the originalsql table completing.
The rest of this gist consists of a python file, its accompanying requirements.txt, and an example cube file.
We are facing same issue with Athena as well.