Complex variable cant reference complex variable
Hey, so i have a job with a lot of tasks and its same for every target, only difference between targets is the parameters of first task.
Since i want to put tasks definitions as a complex variable i have to find a way to pass parameters of task as variable as well.
Will referencing complex variable within complex variable be possible? Do u have idea how to tackle this problem?
Is it possible to do smth like this?
resources:
jobs:
inference:
name: "${bundle.name}-${var.segment}-inference"
job_clusters:
- job_cluster_key: inference
new_cluster: ${var.inference_cluster}
tasks:
- task_key: sensor
....
${var.inference_tasks}
Similar issue here, before the complex variables we wrote a lot of boilerplate code to declare all job clusters.
We wanted to reduce as much as possible the boilerplate and we tried this before discovering this limitation:
variables:
cluster_xxl:
description: Spark big cluster
type: complex
default:
spark_version: ${var.spark_version}
spark_conf: ${var.spark_conf}
num_workers: 1
aws_attributes: ${var.aws_attributes}
node_type_id: m6g.2xlarge
spark_env_vars: ${var.spark_env_vars}
enable_local_disk_encryption: true
cluster_xl:
description: Spark medium cluster
type: complex
default:
spark_version: ${var.spark_version}
spark_conf: ${var.spark_conf}
num_workers: 1
aws_attributes: ${var.aws_attributes}
node_type_id: m6g.xlarge
spark_env_vars: ${var.spark_env_vars}
enable_local_disk_encryption: true
# and we have more ....
We fixed it by duplicating the values of referenced complex variables, but it would be nice if you could remove this limitation.
Thanks
@dinjazelena
i have a job with a lot of tasks and its same for every target
If the job is the same for every target, can't it be defined without variables? You could still use a complex variable for the parameters of the first task and specify overrides for this variable to customize it per target.
resources:
jobs:
job_with_parameters:
name: job_with_parameters
tasks:
- task_key: task_a
spark_python_task:
python_file: ../src/task_a.py
parameters: ${var.first_task_parameters}
- task_key: task_b
depends_on:
- task_key: task_a
spark_python_task:
python_file: ../src/task_a.py
parameters:
- "--foo=bar"
To customize the parameters, you can define a different value per target:
targets:
dev:
variables:
first_task_parameters:
- "--mode=dev"
- "--something=else"
prod:
variables:
first_task_parameters:
- "--mode=prod"
- "--hello=world"
@ribugent Thanks for commenting on your use case, we'll take it into consideration.
I agree it would be nice and in line with expectations, but it takes a bit of an investment to make it possible, and as such we need to trade off the priority. First in line was getting complex variables out in their current form.
One other potential option here can be to use YAML anchors in combination with complex variables. YAML anchors can help with reusing duplication of configuration for common parts of complex variables
Same issue here,
I am trying to create two cluster that share most of their definition except the runtime. These are used in different jobs. As an example:
variables:
catalog:
default: hive_metastore
spark_conf:
default:
spark.databricks.sql.initial.catalog.name: ${var.catalog}
etl_cluster_config:
type: complex
default:
spark_version: 14.3.x-scala2.12
runtime_engine: PHOTON
spark_conf: ${var.spark_conf}
ml_cluster_config:
type: complex
default:
spark_version: 14.3.x-cpu-ml-scala2.12
spark_conf: ${var.spark_conf}
If is there another way to do these let me know.
Thanks!
I'm experiencing a similar issue. I have many jobs (~ 200) with the same settings except for the number of workers, which is different for each job. I have tuned the number of workers according to the resources each job uses, so I'd like to manage it this way.
However, it is currently not possible to use complex variables. I'm looking for a way to override the setting like following:
resources:
jobs:
aa:
...
job_clusters:
- job_cluster_key: default
new_cluster:
<<: ${var.legacy-multi-node-job}
num_workers: 1
bb:
...
job_clusters:
- job_cluster_key: default
new_cluster:
<<: ${var.legacy-multi-node-job}
num_workers: 7
... # many jobs with different num of workers
While it is possible to use native YAML anchors, the jobs are spread across various YAML files, and since anchors need to be declared in each YAML file, they are difficult to maintain when configuration changes occur, so I prefer not to use them.
I'm experiencing similar issues that are already mentioned and it renders complex variables quite useless in my case. There's also an issue in which order variables are evaluated. If you have top-level definition of a job like following:
variables:
cluster_tags:
description: General cluster tags
type: complex
default:
foo: bar
baz: zoo
resources:
jobs:
my_job:
name: my_job
...
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
...
custom_tags: ${var.cluster_tags}
, then you can't add or override cluster's custom tags for different targets like following:
targets:
dev:
resources:
jobs:
my_job:
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
custom_tags:
ResourceClass: SingleNode
, it results in Error: cannot merge string with map because variable is not evaluated before the merge.
I currently have to use YAML anchors since it's the only way I can unpack maps and merge them correctly. I'm expecting similar behavior what @yb-yu explained in https://github.com/databricks/cli/issues/1593#issuecomment-2370101969
I am trying to implement exact same scenario @yb-yu as mentioned in https://github.com/databricks/cli/issues/1593#issuecomment-2370101969 and getting same error. The fix for this issue will allow us to remove lot of repeated code.