Assertions fail on partitioned BigQuery tables with `requirePartitionFilter` enabled
Problem Summary:
I found that assertions for SQLX scripts fail with the following error when config.bigquery.requirePartitionFilter: true:
Cannot query over table 'dataform.repro' without a filter over column(s) 'date' that can be used for partition elimination
Minimal Reproduction:
config {
type: "incremental", // or "table"
bigquery: {
partitionBy: "date",
requirePartitionFilter: true,
},
assertions: {
uniqueKey: "id"
}
}
SELECT
CURRENT_DATE('Japan') AS date,
1 AS id,
The error message originates from BigQuery. I found BigQueryAdaptor uses Adaptor.indexAssertion for the assertion, ignoring BigQueryOptions.
This means the generated assertion SQL does not contain filters for partition columns.
Problem Details:
This is the assertion SQL for the minimal reproduction.
SELECT
*
FROM (
SELECT
id,
COUNT(1) AS index_row_count
FROM `project.dataform.repro`
GROUP BY id
// missing a filter for column "id"
) AS data
WHERE index_row_count > 1
The SQL is generated by Adaptor.indexAssertion and it is missing the filter. This is because that BigQueryAdaptor{.indexAssertion || .rowConditionsAssertion} disregards BigQueryOptions during assertions, causing partition settings to be neglected(=assertion query does not contain partition clause.).
BigQueryAdaptor should have implementations for the methods, like below.
export class BigQueryAdapter extends Adapter implements IAdapter {
// for uniqueKey
public indexAssertion(dataset: string, indexCols: string[]) {
// do something to filter partition columns
const partitionColumnStatement = ...
return `
SELECT
*
FROM (
SELECT
${commaSeparatedColumns},
COUNT(1) AS index_row_count
FROM ${dataset}
GROUP BY ${commaSeparatedColumns}
AND ${partitionColumnStatement}
) AS data
WHERE index_row_count > 1
`
}
}
I just hit the same problem with equivalent context - a bigquery partitioned table and assertions from within the file config block. the assertions are failing due to the wrong sql that is auto-generated.
my solution is to avoid using the built-in assertions from within the config block as you cannot rely on the sql it generates.
it's been quite some time since this was reported - i was wondering if there were any developments in the meantime ? @Ekrekr thank you in advance for your thoughts.
Thanks @RyuSA for raising this. I have also encountered the same problem, and also working around by explicitly creating downstream assertions