databricks-cli icon indicating copy to clipboard operation
databricks-cli copied to clipboard

Windows based Databricks CLI does not parse JSON correctly when trying to run a notebook JOB

Open radu-gheorghiu opened this issue 5 years ago • 4 comments

It seems that when trying to run a notebook JOB in Azure Databricks with custom parameters, passed in from the Databricks CLI as a JSON string, while using a Windows command line, the parsing of the JSON fails, throwing an error like the one below:

C:\Users\radu.gheorghiu>databricks jobs run-now --job-id 2969 --notebook-params '{"system_id":"991", "as_of_date":"2020-05-11", "from_date":"2020-05-01", "to_date":"2020-05-07"}'
Usage: databricks jobs run-now [OPTIONS]
Try 'databricks jobs run-now -h' for help.

Error: Got unexpected extra arguments (as_of_date:2020-05-11, from_date:2020-05-01, to_date:2020-05-07}')

The same command line call works fine in a UNIX command line, however it fails in the Windows command line with the above error.

I'm using the latest version of Databricks CLI:

C:\Users\radu.gheorghiu>databricks -v
Version 0.10.0

radu-gheorghiu avatar May 22 '20 09:05 radu-gheorghiu

I'm facing the same issue, databricks cli 0.9.1 on windows.

The parameters from the help {"name": "john doe", "age": 35} don't work.

The only parameters that are accepted are an empty map: {}.

rolanddb avatar Jun 05 '20 12:06 rolanddb

I'm facing the same issue, databricks cli 0.9.1 on windows.

The parameters from the help {"name": "john doe", "age": 35} don't work.

The only parameters that are accepted are an empty map: {}.

Exactly, that's the only scenario that works for me as well. However, this isn't really how the parameters for the jobs are intended to work. As a workaround I've used the API, for now, to pass parameters and run jobs, until this is fixed.

radu-gheorghiu avatar Jun 05 '20 12:06 radu-gheorghiu

Same issue here, this time parsing a job description with cron expression which must have spaces. Reproducible example tested in Azure Databricks:

> databricks jobs create --json '"{\"name\":\"Nightly_model_training\",\"new_cluster\":{\"spark_version\":\"7.3.x-scala2.12\",\"node_type_id\":\"Standard_DS12_v2\",\"num_workers\":1},\"libraries\":[{\"jar\":\"dbfs:/my-jar.jar\"},{\"maven\":{\"coordinates\":\"org.jsoup:jsoup:1.7.2\"}}],\"max_retries\":1,\"spark_jar_task\":{\"main_class_name\":\"com.databricks.ComputeModels\"}}"'
{
  "job_id": 12
}

> databricks jobs delete --job-id 12

> databricks jobs create --json '"{\"name\":\"Nightly_model_training\",\"new_cluster\":{\"spark_version\":\"7.3.x-scala2.12\",\"node_type_id\":\"Standard_DS12_v2\",\"num_workers\":1},\"libraries\":[{\"jar\":\"dbfs:/my-jar.jar\"},{\"maven\":{\"coordinates\":\"org.jsoup:jsoup:1.7.2\"}}],\"max_retries\":1,\"schedule\":{\"quartz_cron_expression\":\"0 15 22 ? * *\",\"timezone_id\":\"America/Los_Angeles\"},\"spark_jar_task\":{\"main_class_name\":\"com.databricks.ComputeModels\"}}"'
Usage: databricks jobs create [OPTIONS]
Try 'databricks jobs create -h' for help.

Error: Got unexpected extra arguments (15 22 ? * *","timezone_id":"America/Los_Angeles"},"spark_jar_task":{"main_class_name":"com.databricks.ComputeModels"}})

Using Anaconda's Prompt in Powershell in Windows 10. Versions:

> conda --version
conda 4.9.2

> $PSVersionTable.PSVersion
Major  Minor  Build  Revision
-----  -----  -----  --------
5      1      19041  906

> databricks -v
Version 0.14.0

My workaround is to use WSL together with PowerShell Core and it works. (The deployment script has extra steps in PowerShell)

emerrf avatar Apr 29 '21 09:04 emerrf

I had the same symptoms as well. In my case the fix was to save the JSON to file as ANSI (or UTF-8) encoding then use "databricks jobs create" on the resulting file (with edits, see below). Note it was UTF-8 when the error occurred which should have worked but did not. The step of saving to file with the specified encoding may strip out something it does not like.

In Powershell:

$settings = databricks jobs get --job-id 123456
$settings | out-file -encoding ASCII <filename>.json

# Note you have to edit the output JSON file to remove "job_id" and "settings" nodes and their corresponding level (outer brackets).

databricks jobs create --json-file <filename>.json

Hope this helps someone!

TJ4565 avatar Mar 11 '22 14:03 TJ4565