Add config blocks for the "source" models
Overview
Adds config blocks to the upstream "source" models according to the dbt project file.
Update type - breaking / non-breaking
- [ ] Minor bug fix
- [ ] Documentation improvements
- [x] Quality of Life improvements
- [x] New features (non-breaking change)
- [ ] New features (breaking change)
- [ ] Other (non-breaking change)
- [ ] Other (breaking change)
- [ ] Release preparation
What does this solve?
I've been working on standardizing model configs in my project and I set the default materialization to table in the dbt_project.yml. It appears to have overridden the materialization for these "source" models such that the subsequent runs had been doing a CREATE OR REPLACE and wiping out all of the data without my knowing. At least that is what I think happened. I copied the project config settings from the repo to include the source block like so:
dbt_artifacts:
+schema: dbt_artifacts
staging:
+schema: dbt_artifacts
sources:
+materialized: incremental
+on_schema_change: append_new_columns
and I confirmed that my dbt runs now look like:
21:19:30 1 of 34 START sql incremental model dbt_artifacts.exposures .................... [RUN]
21:19:30 2 of 34 START sql incremental model dbt_artifacts.invocations .................. [RUN]
...
instead of what they used to look like:
[2023-10-02, 21:02:38 UTC] {dbt_hook.py:191} INFO - [0m21:02:38 1 of 34 START sql table model dbt_artifacts.exposures .......................... [RUN]
[2023-10-02, 21:02:38 UTC] {dbt_hook.py:191} INFO - [0m21:02:38 2 of 34 START sql table model dbt_artifacts.invocations ........................ [RUN]
Outstanding questions
Is this actually the expected behavior of the dbt_project configs? I would have thought that because the materialization for the source block was set in the project config for this package that I wouldn't be able to override it like I did.
Is there anything lost by setting these as project configs? I don't feel strongly about this change I'm just throwing it out there that this would have saved me from losing all my data.
What databases have you tested with?
- [x] Snowflake
- [ ] Google BigQuery
- [ ] Databricks
- [ ] Spark
- [ ] N/A
Hi @jsnb-devoted thanks for opening this PR. I just wanted to clarify my understanding on this.
In your dbt_project.yml file you have configured the materialisation for dbt_artifacts models?
Hi @jsnb-devoted thanks for opening this PR. I just wanted to clarify my understanding on this.
In your
dbt_project.ymlfile you have configured the materialisation fordbt_artifactsmodels?
Hi @jared-rimmer -- This was my project config:
models:
+persist_docs:
relation: true
columns: true
devoted:
<... all my project configs>
dbt_artifacts:
+schema: dbt_artifacts
staging:
+schema: dbt_artifacts
Then I made the default materialization for my project table so it looked like this:
models:
+materialized: table
+transient: true
devoted:
<... all my project configs>
dbt_artifacts:
+schema: dbt_artifacts
staging:
+schema: dbt_artifacts
And my understanding is that overrode the "source" models materialization in the dbt_artifacts dbt_project file and it started dropping and creating those tables -- I didn't notice for long enough that the snowflake time travel expired and I lost all the data.
I assumed that the project config in dbt_artifacts couldn't be overridden like that. Assuming its not just user error on my part I figure you probably don't want this happening to anyone. You don't want those models to ever have their materialization overridden.
bump - @jared-rimmer any thoughts on this?
Anyone? Maybe @jaypeedevlin since it looks like you implemented the original feature? Happy to close this if this was just user error or if it isn't any interest.
The current documentation has this around configurations, maybe it addresses your query.
Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):
Configurations made in your dbt_project.yml file will override any configurations in a package (either in the dbt_project.yml file of the package, or in config blocks).
On a side note, with 2.6.1, I get an error on dbt run -s dbt_artifacts, I had to add incremental_strategy to my dbt_project.yml. I am still checking if I am missing something:
'Compilation Error in model model_executions (models/sources/model_executions.sql)
Invalid incremental strategy provided: None
Expected one of: \'append\', \'merge\', \'insert_overwrite\'
> in macro dbt_spark_validate_get_incremental_strategy (macros/materializations/incremental/validate.sql)
The current documentation has this around configurations, maybe it addresses your query.
Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):
This is a good call out @hash0b. I did not see this note in the README -- had I read this I probably wouldn't have lost all that data.
I don't feel that strongly about it but it seems like there isn't any benefit to letting someone override the materialization for these particular "source" style models. You can prevent that from happening by hardcoding the materialization like I did in the PR. It's no longer an issue for me anymore so it is really up to you all maintainers @jared-rimmer @jaypeedevlin
Hi all,
just to add here that I no longer work for Brooklyn Data so am not actively involved in maintaining this package.