dbt task in asset bundle deployment, errors if `artifacts` included and `git_source` missing, inaccurate location if `artifacts` missing and `git_source` missing
Describe the issue
The more I attempt to troubleshoot this the less sure I am regarding what is a bug and what is by design but confusing.
I originally had an issue because I added a dbt task to my pipeline and forgot to add the git source for the dbt task.
When I attempted to deploy the updated asset I get the error message: build failed <package_name>: error chdir <bundle_path>: no such file or directory, output .
While troubleshooting this I found that if I remove the artifacts section from my asset bundle the deployment will succeed, but the dbt task assumes that the project directory is the asset bundle deployment location, e.g. /Shared/.bundle/dbx_data_quality/dev/files. I assume that this location being used as the project directory for the dbt task is the reason for the error and failed deployment, but this seems like a bug still because the path actually did it exist already.
Deploying this way results in a task that has these arguments:
I solved the issue by adding the git_source section to my job in the asset bundle, which keeps the project directory from being set at all on the dbt task.
git_source:
git_branch: develop
git_provider: azureDevOpsServices
git_url:
https://<organization>@dev.azure.com/<organization>/<project>/_git/dbx-dbt-legacy
Configuration
Please provide a minimal reproducible configuration for the issue
Steps to reproduce the behavior
To reproduce the Error: build failed dbx_data_quality, ... error you need an asset bundle that contains a python task and a dbt task, with an artifacts section included in the yaml, using a relative path. The job cannot have a git_source section.
bundle:
name: dbx_data_quality
artifacts:
dbx_pipeline_legacy:
path: .
type: whl
targets:
dev:
mode: development
resources:
jobs:
dbx_data_quality:
name: dbx_data_quality (dev)
tasks:
- job_cluster_key: basic_cluster
libraries:
- whl: ./dist/dbx_data_quality-*.whl
python_wheel_task:
entry_point: setup
package_name: dbx_data_quality
task_key: setup
- dbt_task:
catalog: dev
commands:
- dbt deps
- dbt test
schema: corrections
depends_on:
- task_key: setup
job_cluster_key: basic_cluster
libraries:
- pypi:
package: dbt-databricks==1.7.8
run_if: ALL_DONE
task_key: dbt_tests
workspace:
profile: dev
Expected Behavior
I'm not sure. The example of a dbt task in the docs shows a git_source section, so it seems that is the expected way of using a dbt task. I think that likely we would want to require a git_source section or ensure that if we do not have one and we have a relative path to the python wheel artifact that the dbt task does not cause a deployment failure.
Actual Behavior
With the artifacts section the deployment fails with a confusing error message. With the git_source section the deployment succeeds. Without either the artifacts or git_source section the deployment succeeds with the artifact directory as the project directory for the dbt task.
OS and CLI version
OS: Ubuntu 22.04 on WSL2 via Windows 11 CLI Version: Databricks CLI v0.214.1
Is this a regression?
I tried this in 0.213.0 and it did not work in that version either.
Debug Logs
Output logs if you run the command with debug logs enabled. Example: databricks bundle deploy --log-level=debug. Redact if needed with_artifacts_section_and_git_source_section.txt no_artifacts_section_no_git_source_section.txt with_artifacts_section_no_git_source_section.txt