dbt-sqlserver icon indicating copy to clipboard operation
dbt-sqlserver copied to clipboard

[Bug] Creating nonclustered indexes in a SQL Server materialized table is resulting in an orphaned clustered columnstore index

Open ianlcassidy opened this issue 1 year ago • 3 comments

Originally posted in dbt-core, but moving here since this seems to be a bug with the create_nonclustered_index macro.

Current Behavior

When creating a materialized table and specifying nonclustered indexes in the config post-hook, a clustered columnstore index is created on the dbt tmp table and then the tmp table is renamed. This is resulting in our database backup failing (we use Commvault) because the clustered columnstore index is "orphaned."

Expected Behavior

We expect the clustered columnstore index to be created after the tmp table is renamed, similar to how the nonclustered indexes are created. We have tested this theory by modifying the execution steps in the dbt.log file and our database backup succeeds.

Steps To Reproduce

  1. use dbt to create an arbitrary materialized table using the sql server adaptor
  2. specify at least one nonclustered index in the config post-hook
  3. run dbt on the single model
  4. inspect the DDL or look at the indexes created including the clustered columnstore index

Relevant log output

use [DATABASE];
if EXISTS (
   SELECT *
   FROM sys.indexes with (nolock)
   WHERE name = 'SCHEMA_TABLE__dbt_tmp_cci'
   AND object_id=object_id('SCHEMA_TABLE__dbt_tmp')
)
DROP index "SCHEMA"."TABLE__dbt_tmp".SCHEMA_TABLE__dbt_tmp_cci
CREATE CLUSTERED COLUMNSTORE INDEX SCHEMA_TABLE__dbt_tmp_cci    <---- index created on tmp table
ON "SCHEMA"."TABLE__dbt_tmp"

USE [DATABASE];
      EXEC sp_rename 'SCHEMA.TABLE', 'TABLE__dbt_backup'

USE [USB-DataScience];
      EXEC sp_rename 'SCHEMA.TABLE__dbt_tmp', 'TABLE'    <---- tmp table renamed

if not exists(select *
              from sys.indexes with (nolock)
              where name = 'nonclustered_6eb6a6764d9714280e37441d1ac3d5ae'
              and object_id = OBJECT_ID('"DATABASE"."SCHEMA"."TABLE"')
)
begin
create nonclustered index   <--- index created on actual table
       nonclustered_6eb6a6764d9714280e37441d1ac3d5ae
       on "DATABASE"."SCHEMA"."TABLE" ([FIELD])
        
end

Environment

- OS: Windows 10
- Python: 3.11.9
- dbt: 1.8.7

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

We are using the dbt SQL Server adapter.

Sample dbt model code:

{{
    config({
        "materialized": "table",
        "post-hook": [
            "{{ create_nonclustered_index(columns = ['FIELD']) }}",
        ],
    })
}}

select
  *
from
  {{ source('SCHEMA', 'TABLE') }};

ianlcassidy avatar Nov 01 '24 15:11 ianlcassidy

Additionally, if we use the create_clustered_index macro then the dbt run fails completely. Adding the below to a model

"{{ create_clustered_index(columns = ['date_key'], unique=True) }}"

causes the dbt build to fail with the below since there's already an orphaned clustered index on the table.

Database Error in model dim_date (models\marts\dim_date.sql)
  ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]
Cannot create more than one clustered index on table 'dev.dbo.dim_date'.
Drop the existing clustered index 'dbo_dim_date__dbt_tmp_cci' before creating another. (1902) (SQLExecDirectW)")

MiConnell avatar Feb 01 '25 00:02 MiConnell

Hmm, I wonder if there needs to be an adjustment on where the index is created. Right now its happening post table creation here https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/relations/table/create.sql#L41

I think one issue is the relation is being created on the intermediate relation here https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/include/sqlserver/macros/materializations/models/table/table.sql#L31C39-L31C60 which is not the same as the original one. So when its created it then has a rename done as the final step.

Not sure if this is where to start, or if its someplace closer to the actual table creation statement. Ideally if its closer to the table creation then anything creating a table will pull from that, otherwise it might need to live somewhere in the incremental/table mats.

cody-scott avatar Feb 10 '25 15:02 cody-scott

Perhaps moving it over to the create_indexes is part of it, but that would need to get overwritten i suspect.

cody-scott avatar Feb 10 '25 15:02 cody-scott