Add `jobs run-script` command
Use case databricks jobs run-script metastore-export.py as equivalent for:
databricks workspace import metastore-export.py /tmp/metastore-export_$RANDOM --language python --overwrite
# creating Spark Submit job - https://docs.databricks.com/dev-tools/api/latest/jobs.html#runs-submit
# waiting for completion of the job, perhaps showing status update every 30 seconds
databricks fs rm dbfs:/tmp/metastore-export_$RANDOM
TBD: cluster parameters
Could be fixed by #455
With #455 you could do this:
databricks execution-context command-execute-once --cluster-id <CLUSTER_ID> --command "$(cat metastore-export.py)" --wait
@fjakobs how can we make that invocation simpler?
Just sketching:
We could
- move the command under the cluster group
- make
--wait=Truethe default - add an argument that reads the command from a file
- add an option to reference a cluster by name
databricks cluster execute --cluster-name <CLUSTER_NAME> --command-file metastore-export.py
@fjakobs that looks a lot simpler!
how the results are going to be exported, e.g. if there's return in the end of the script?
We could have different modes. The default could just be informative text based output like I have already implemented:
$ python -m databricks_cli.cli execution-context command-execute-once --cluster-id <CLUSTER_ID> --command "$(cat spark.py)" --wait=True
Status: Queued
Status: Running
Status: Finished
Command ID: c98edf8a-418a-4fa0-b69c-dcfae9db917e
output > +---------+----------+--------+----------+------+------+
output > |firstname|middlename|lastname| dob|gender|salary|
output > +---------+----------+--------+----------+------+------+
output > | James| | Smith|1991-04-01| M| 3000|
output > | Michael| Rose| |2000-05-19| M| 4000|
output > | Robert| |Williams|1978-09-05| M| 4000|
output > | Maria| Anne| Jones|1967-12-01| F| 4000|
output > | Jen| Mary| Brown|1980-02-17| F| -1|
output > +---------+----------+--------+----------+------+------+
For use in shell scripts we can also just return the last status result with the embedded data as JSON:
$ python -m databricks_cli.cli execution-context command-execute-once --cluster-id <CLUSTER_ID> --command "$(cat spark.py)" --wait=True --output=json
{
"id": "b976fcff-8a32-4278-a3ff-cb684945e238",
"status": "Finished",
"results": {
"resultType": "text",
"data": "+---------+----------+--------+----------+------+------+\n|firstname|middlename|lastname| dob|gender|salary|\n+---------+----------+--------+----------+------+------+\n| James| | Smith|1991-04-01| M| 3000|\n| Michael| Rose| |2000-05-19| M| 4000|\n| Robert| |Williams|1978-09-05| M| 4000|\n| Maria| Anne| Jones|1967-12-01| F| 4000|\n| Jen| Mary| Brown|1980-02-17| F| -1|\n+---------+----------+--------+----------+------+------+"
}
}
For tabular data as returned from SQL commands a CSV output would also be nice.