dbt task in asset bundle deployment, errors if `artifacts` included and `git_source` missing, inaccurate location if `artifacts` missing and `git_source` missing

Open NodeJSmith opened this issue 1 year ago • 0 comments

Describe the issue

The more I attempt to troubleshoot this the less sure I am regarding what is a bug and what is by design but confusing.

I originally had an issue because I added a dbt task to my pipeline and forgot to add the git source for the dbt task.

When I attempted to deploy the updated asset I get the error message: build failed <package_name>: error chdir <bundle_path>: no such file or directory, output .

While troubleshooting this I found that if I remove the artifacts section from my asset bundle the deployment will succeed, but the dbt task assumes that the project directory is the asset bundle deployment location, e.g. /Shared/.bundle/dbx_data_quality/dev/files. I assume that this location being used as the project directory for the dbt task is the reason for the error and failed deployment, but this seems like a bug still because the path actually did it exist already. Deploying this way results in a task that has these arguments:

I solved the issue by adding the git_source section to my job in the asset bundle, which keeps the project directory from being set at all on the dbt task.

          git_source:
            git_branch: develop
            git_provider: azureDevOpsServices
            git_url: 
              https://<organization>@dev.azure.com/<organization>/<project>/_git/dbx-dbt-legacy

Configuration

Please provide a minimal reproducible configuration for the issue

Steps to reproduce the behavior

To reproduce the Error: build failed dbx_data_quality, ... error you need an asset bundle that contains a python task and a dbt task, with an artifacts section included in the yaml, using a relative path. The job cannot have a git_source section.

bundle:
  name: dbx_data_quality

artifacts:
  dbx_pipeline_legacy:
    path: .
    type: whl

targets:
  dev:
    mode: development
    resources:
      jobs:
        dbx_data_quality:
          name: dbx_data_quality (dev)
          tasks:
          - job_cluster_key: basic_cluster
            libraries:
            - whl: ./dist/dbx_data_quality-*.whl
            python_wheel_task:
              entry_point: setup
              package_name: dbx_data_quality
            task_key: setup
          - dbt_task:
              catalog: dev
              commands:
              - dbt deps
              - dbt test
              schema: corrections
            depends_on:
            - task_key: setup
            job_cluster_key: basic_cluster
            libraries:
            - pypi:
                package: dbt-databricks==1.7.8
            run_if: ALL_DONE
            task_key: dbt_tests
    workspace:
      profile: dev

Expected Behavior

I'm not sure. The example of a dbt task in the docs shows a git_source section, so it seems that is the expected way of using a dbt task. I think that likely we would want to require a git_source section or ensure that if we do not have one and we have a relative path to the python wheel artifact that the dbt task does not cause a deployment failure.

Actual Behavior

With the artifacts section the deployment fails with a confusing error message. With the git_source section the deployment succeeds. Without either the artifacts or git_source section the deployment succeeds with the artifact directory as the project directory for the dbt task.

OS and CLI version

OS: Ubuntu 22.04 on WSL2 via Windows 11 CLI Version: Databricks CLI v0.214.1

Is this a regression?

I tried this in 0.213.0 and it did not work in that version either.

Debug Logs

Output logs if you run the command with debug logs enabled. Example: databricks bundle deploy --log-level=debug. Redact if needed with_artifacts_section_and_git_source_section.txt no_artifacts_section_no_git_source_section.txt with_artifacts_section_no_git_source_section.txt

Feb 29 '24 22:02 NodeJSmith