[Bug] Creating nonclustered indexes in a SQL Server materialized table is resulting in an orphaned clustered columnstore index
Originally posted in dbt-core, but moving here since this seems to be a bug with the create_nonclustered_index macro.
Current Behavior
When creating a materialized table and specifying nonclustered indexes in the config post-hook, a clustered columnstore index is created on the dbt tmp table and then the tmp table is renamed. This is resulting in our database backup failing (we use Commvault) because the clustered columnstore index is "orphaned."
Expected Behavior
We expect the clustered columnstore index to be created after the tmp table is renamed, similar to how the nonclustered indexes are created. We have tested this theory by modifying the execution steps in the dbt.log file and our database backup succeeds.
Steps To Reproduce
- use dbt to create an arbitrary materialized table using the sql server adaptor
- specify at least one nonclustered index in the config post-hook
- run dbt on the single model
- inspect the DDL or look at the indexes created including the clustered columnstore index
Relevant log output
use [DATABASE];
if EXISTS (
SELECT *
FROM sys.indexes with (nolock)
WHERE name = 'SCHEMA_TABLE__dbt_tmp_cci'
AND object_id=object_id('SCHEMA_TABLE__dbt_tmp')
)
DROP index "SCHEMA"."TABLE__dbt_tmp".SCHEMA_TABLE__dbt_tmp_cci
CREATE CLUSTERED COLUMNSTORE INDEX SCHEMA_TABLE__dbt_tmp_cci <---- index created on tmp table
ON "SCHEMA"."TABLE__dbt_tmp"
USE [DATABASE];
EXEC sp_rename 'SCHEMA.TABLE', 'TABLE__dbt_backup'
USE [USB-DataScience];
EXEC sp_rename 'SCHEMA.TABLE__dbt_tmp', 'TABLE' <---- tmp table renamed
if not exists(select *
from sys.indexes with (nolock)
where name = 'nonclustered_6eb6a6764d9714280e37441d1ac3d5ae'
and object_id = OBJECT_ID('"DATABASE"."SCHEMA"."TABLE"')
)
begin
create nonclustered index <--- index created on actual table
nonclustered_6eb6a6764d9714280e37441d1ac3d5ae
on "DATABASE"."SCHEMA"."TABLE" ([FIELD])
end
Environment
- OS: Windows 10
- Python: 3.11.9
- dbt: 1.8.7
Which database adapter are you using with dbt?
other (mention it in "Additional Context")
Additional Context
We are using the dbt SQL Server adapter.
Sample dbt model code:
{{
config({
"materialized": "table",
"post-hook": [
"{{ create_nonclustered_index(columns = ['FIELD']) }}",
],
})
}}
select
*
from
{{ source('SCHEMA', 'TABLE') }};
Additionally, if we use the create_clustered_index macro then the dbt run fails completely.
Adding the below to a model
"{{ create_clustered_index(columns = ['date_key'], unique=True) }}"
causes the dbt build to fail with the below since there's already an orphaned clustered index on the table.
Database Error in model dim_date (models\marts\dim_date.sql)
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]
Cannot create more than one clustered index on table 'dev.dbo.dim_date'.
Drop the existing clustered index 'dbo_dim_date__dbt_tmp_cci' before creating another. (1902) (SQLExecDirectW)")
Hmm, I wonder if there needs to be an adjustment on where the index is created. Right now its happening post table creation here https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/relations/table/create.sql#L41
I think one issue is the relation is being created on the intermediate relation here https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/materializations/models/table/table.sql#L31C39-L31C60 which is not the same as the original one. So when its created it then has a rename done as the final step.
Not sure if this is where to start, or if its someplace closer to the actual table creation statement. Ideally if its closer to the table creation then anything creating a table will pull from that, otherwise it might need to live somewhere in the incremental/table mats.
Perhaps moving it over to the create_indexes is part of it, but that would need to get overwritten i suspect.