elementary icon indicating copy to clipboard operation
elementary copied to clipboard

databricks - error creating elementary.dbt_columns/models/sources tables

Open jakub-auger opened this issue 1 year ago • 6 comments

Describe the bug Error thrown for dbt_columns, dbt_models and _dbt_sources table creation during first dbt run after elementary is added to the dbt project

03:53:00 Completed with 3 errors and 0 warnings: 03:53:00 03:53:00 Runtime Error in model dbt_columns (models\edr\dbt_artifacts\dbt_columns.sql) 03:53:00 [RequestId=4c2efc34-3ea5-4d1b-9afa-155f5ecae9be ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_columns' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_columns. 03:53:00 03:53:00 Runtime Error in model dbt_models (models\edr\dbt_artifacts\dbt_models.sql) 03:53:00 [RequestId=33c14b44-302b-48c9-a765-da35ae379a12 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@.dfs.core.windows.net/elementary/dbt_models' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_models. 03:53:00 03:53:00 Runtime Error in model dbt_sources (models\edr\dbt_artifacts\dbt_sources.sql) 03:53:00 [RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

To Reproduce Steps to reproduce the behavior:

  1. dbt run --select elementary

Expected behavior no errors

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • Elementary CLI (edr) version: [e.g. 0.5.3], can be found by running pip show elementary-data not installed

  • Elementary dbt package version: [e.g. 0.4.1], can be found in packages.yml file

    • package: elementary-data/elementary version: 0.15.2
  • dbt version you're using [e.g. 1.8.1] Core:

    • installed: 1.5.11
    • latest: 1.8.3 - Update available!

    Your version of dbt-core is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation

Plugins:

  • databricks: 1.5.7 - Update available!
  • spark: 1.5.3 - Update available!
  • Data warehouse [e.g. snowflake] azure databricks
  • Infrastructure details (e.g. operating system, prod / dev / staging, deployment infra, CI system, etc) azure

Additional context This is a clean install I'm using external tables

tried updating dbt-core and databricks, but same error

(dbt-dev) C:\git\aic_datalakehouse>dbt -v using legacy validation callback Core:

  • installed: 1.8.3
  • latest: 1.8.3 - Up to date!

Plugins:

  • databricks: 1.8.3 - Up to date!
  • spark: 1.8.0 - Up to date!

jakub-auger avatar Jul 15 '24 04:07 jakub-auger

It looks like it failed to create these tables.

[RequestId=f4d77330-7c89-403c-8f58-54069dd7c217 ErrorClass=INVALID_PARAMETER_VALUE.LOCATION_OVERLAP] Input path url 'abfss://datalakehouse@****.dfs.core.windows.net/elementary/dbt_sources' overlaps with other external tables or volumes within 'CreateTable' call. Conflicting tables/volumes: datalakehouse.elementary.dbt_sources.

From what I see, this is a Databricks error regarding privileges: https://docs.databricks.com/en/sql/language-manual/sql-ref-external-locations.html

NoyaArie avatar Jul 17 '24 11:07 NoyaArie

@NoyaArie thanks for looking into it

that's strange as I am using a single account/token (mine - admin) to run it. None of the other models in the project have problems

in fact those 3 tables (along with the rest - 24 all up?) are created

the error is referring to dbt/elementary trying to create tables that overlap the same physical location in my blob/datalake storage location

ok found it in the logs

looks like elementary is trying to create the temp/staging table in the same external location as the final table triggering the error. Allowing it would cause the final table to be overwritten with the staging data.

Are there any known workarounds?

`[0m13:38:50.641894 [debug] [Thread-2 (]: On model.elementary.dbt_columns: /* {"app": "dbt", "dbt_version": "1.8.3", "dbt_databricks_version": "1.8.3", "databricks_sql_connector_version": "3.1.2", "profile_name": "aic_datalakehouse", "target_name": "prod", "node_id": "model.elementary.dbt_columns"} */

    create or replace table `datalakehouse`.`elementary`.`dbt_columns__tmp_20240719040850594589`
  
  using delta
  
  
  
  
  
location 'abfss://[email protected]/elementary/dbt_columns'
  
  
  as
  
    SELECT
    
        *
    
    FROM `datalakehouse`.`elementary`.`dbt_columns`
    WHERE 1 = 0

`

jakub-auger avatar Jul 19 '24 04:07 jakub-auger

Hey @jakub-auger , sorry for the late response... 🫤 Were you able to resolve the issue? - i think it might be related to dbt-databricks itself, and an update to it may help with that

ofek1weiss avatar Sep 26 '24 09:09 ofek1weiss

Hi @ofek1weiss No, i haven't included elementary in my project since then

I dont see what the fix within dbt-databricks would be? it's working as designed - stopping someone from trying to save different tables in the same data location. I'd be concerned if it let it happen!

Can you explain the process of how they're created & purpose of the temp tables? I use externally managed tables in databricks.

a 'simple' way to fix the above issue is to modify the location to include the temp table name - BUT databricks doesn't delete the raw data when an external table is dropped so i'd be left with a plethora of ./__tmp_2345i9304959 tables in my datalake

Is elementary not compatible being set up as externally managed tables in databricks?

jakub-auger avatar Sep 30 '24 02:09 jakub-auger

@ofek1weiss update: did not work with the latest version of dbt

Did work once i switched elementary to use managed tables. Recommend adding that somewhere to the docs

jakub-auger avatar Sep 30 '24 06:09 jakub-auger

I encountered the same issue - it has nothing to do with dbt-databricks. Managed tables is the only way to get this working currently.

eckesru avatar Jun 22 '25 14:06 eckesru