databricks - error creating elementary.dbt_columns/models/sources tables
Describe the bug Error thrown for dbt_columns, dbt_models and _dbt_sources table creation during first dbt run after elementary is added to the dbt project
03:53:00 Completed with 3 errors and 0 warnings: 03:53:00 03:53:00 Runtime Error in model dbt_columns (models\edr\dbt_artifacts\dbt_columns.sql) 03:53:00 [RequestId=4c2efc34-3ea5-4d1b-9afa-155f5ecae9be ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_columns' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_columns. 03:53:00 03:53:00 Runtime Error in model dbt_models (models\edr\dbt_artifacts\dbt_models.sql) 03:53:00 [RequestId=33c14b44-302b-48c9-a765-da35ae379a12 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_models' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_models. 03:53:00 03:53:00 Runtime Error in model dbt_sources (models\edr\dbt_artifacts\dbt_sources.sql) 03:53:00 [RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.
To Reproduce Steps to reproduce the behavior:
- dbt run --select elementary
Expected behavior no errors
Screenshots If applicable, add screenshots to help explain your problem.
Environment (please complete the following information):
-
Elementary CLI (edr) version: [e.g. 0.5.3], can be found by running
pip show elementary-datanot installed -
Elementary dbt package version: [e.g. 0.4.1], can be found in
packages.ymlfile- package: elementary-data/elementary version: 0.15.2
-
dbt version you're using [e.g. 1.8.1] Core:
- installed: 1.5.11
- latest: 1.8.3 - Update available!
Your version of dbt-core is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation
Plugins:
- databricks: 1.5.7 - Update available!
- spark: 1.5.3 - Update available!
- Data warehouse [e.g. snowflake] azure databricks
- Infrastructure details (e.g. operating system, prod / dev / staging, deployment infra, CI system, etc) azure
Additional context This is a clean install I'm using external tables
tried updating dbt-core and databricks, but same error
(dbt-dev) C:\git\aic_datalakehouse>dbt -v using legacy validation callback Core:
- installed: 1.8.3
- latest: 1.8.3 - Up to date!
Plugins:
- databricks: 1.8.3 - Up to date!
- spark: 1.8.0 - Up to date!
It looks like it failed to create these tables.
[RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.
From what I see, this is a Databricks error regarding privileges: https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html
@NoyaArie thanks for looking into it
that's strange as I am using a single account/token (mine - admin) to run it. None of the other models in the project have problems
in fact those 3 tables (along with the rest - 24 all up?) are created
the error is referring to dbt/elementary trying to create tables that overlap the same physical location in my blob/datalake storage location
ok found it in the logs
looks like elementary is trying to create the temp/staging table in the same external location as the final table triggering the error. Allowing it would cause the final table to be overwritten with the staging data.
Are there any known workarounds?
`[0m13:38:50.641894 [debug] [Thread-2 (]: On model.elementary.dbt_columns: /* {"app": "dbt", "dbt_version": "1.8.3", "dbt_databricks_version": "1.8.3", "databricks_sql_connector_version": "3.1.2", "profile_name": "aic_datalakehouse", "target_name": "prod", "node_id": "model.elementary.dbt_columns"} */
create or replace table `datalakehouse`.`elementary`.`dbt_columns__tmp_20240719040850594589`
using delta
location 'abfss://[email protected]/elementary/dbt_columns'
as
SELECT
*
FROM `datalakehouse`.`elementary`.`dbt_columns`
WHERE 1 = 0
`
Hey @jakub-auger , sorry for the late response... 🫤
Were you able to resolve the issue? - i think it might be related to dbt-databricks itself, and an update to it may help with that
Hi @ofek1weiss No, i haven't included elementary in my project since then
I dont see what the fix within dbt-databricks would be? it's working as designed - stopping someone from trying to save different tables in the same data location. I'd be concerned if it let it happen!
Can you explain the process of how they're created & purpose of the temp tables? I use externally managed tables in databricks.
a 'simple' way to fix the above issue is to modify the location to include the temp table name - BUT databricks doesn't delete the raw data when an external table is dropped so i'd be left with a plethora of ./__tmp_2345i9304959 tables in my datalake
Is elementary not compatible being set up as externally managed tables in databricks?
@ofek1weiss update: did not work with the latest version of dbt
Did work once i switched elementary to use managed tables. Recommend adding that somewhere to the docs
I encountered the same issue - it has nothing to do with dbt-databricks. Managed tables is the only way to get this working currently.