Error logs on Databricks Runtime 11.3 LTS do not display correctly.
Expected Behavior
When looking at the Databricsk UI the error message with the stack trace is displayed in a clear way.
Current Behavior
When running the job on the Databricks 11.3 LTS Runtime the error message in the UI contains ANSI escape characters:
== SQL ==
this table doesn't exist
-----^^^
[0;31m---------------------------------------------------------------------------[0m
[0;31mParseException[0m Traceback (most recent call last)
[0;32m<command--1>[0m in [0;36m<cell line: 13>[0;34m()[0m
[1;32m 12[0m [0;34m[0m[0m
[1;32m 13[0m [0;32mwith[0m [0mopen[0m[0;34m([0m[0mfilename[0m[0;34m,[0m [0;34m"rb"[0m[0;34m)[0m [0;32mas[0m [0mf[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 14[0;31m [0mexec[0m[0;34m([0m[0mcompile[0m[0;34m([0m[0mf[0m[0;34m.[0m[0mread[0m[0;34m([0m[0;34m)[0m[0;34m,[0m [0mfilename[0m[0;34m,[0m [0;34m'exec'[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 15[0m [0;34m[0m[0m
[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36m<module>[0;34m[0m
[1;32m 7[0m [0;34m[0m[0m
[1;32m 8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 9[0;31m [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;32m/tmp/tmpxxsbdj9b.py[0m in [0;36mentrypoint[0;34m()[0m
[1;32m 4[0m [0;32mdef[0m [0mentrypoint[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m 5[0m [0mspark[0m [0;34m=[0m [0mSparkSession[0m[0;34m.[0m[0mbuilder[0m[0;34m.[0m[0mgetOrCreate[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m----> 6[0;31m [0mspark[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0;34m"this table doesn't exist"[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 7[0m [0;34m[0m[0m
[1;32m 8[0m [0;32mif[0m [0m__name__[0m [0;34m==[0m [0;34m"__main__"[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/instrumentation_utils.py[0m in [0;36mwrapper[0;34m(*args, **kwargs)[0m
[1;32m 46[0m [0mstart[0m [0;34m=[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m 47[0m [0;32mtry[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0;32m---> 48[0;31m [0mres[0m [0;34m=[0m [0mfunc[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 49[0m logger.log_success(
[1;32m 50[0m [0mmodule_name[0m[0;34m,[0m [0mclass_name[0m[0;34m,[0m [0mfunction_name[0m[0;34m,[0m [0mtime[0m[0;34m.[0m[0mperf_counter[0m[0;34m([0m[0;34m)[0m [0;34m-[0m [0mstart[0m[0;34m,[0m [0msignature[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/sql/session.py[0m in [0;36mtable[0;34m(self, tableName)[0m
[1;32m 1138[0m [0;32mTrue[0m[0;34m[0m[0;34m[0m[0m
[1;32m 1139[0m """
[0;32m-> 1140[0;31m [0;32mreturn[0m [0mDataFrame[0m[0;34m([0m[0mself[0m[0;34m.[0m[0m_jsparkSession[0m[0;34m.[0m[0mtable[0m[0;34m([0m[0mtableName[0m[0;34m)[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 1141[0m [0;34m[0m[0m
[1;32m 1142[0m [0;34m@[0m[0mproperty[0m[0;34m[0m[0;34m[0m[0m
[0;32m/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py[0m in [0;36m__call__[0;34m(self, *args)[0m
[1;32m 1319[0m [0;34m[0m[0m
[1;32m 1320[0m [0manswer[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mgateway_client[0m[0;34m.[0m[0msend_command[0m[0;34m([0m[0mcommand[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;32m-> 1321[0;31m return_value = get_return_value(
[0m[1;32m 1322[0m answer, self.gateway_client, self.target_id, self.name)
[1;32m 1323[0m [0;34m[0m[0m
[0;32m/databricks/spark/python/pyspark/sql/utils.py[0m in [0;36mdeco[0;34m(*a, **kw)[0m
[1;32m 200[0m [0;31m# Hide where the exception came from that shows a non-Pythonic[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[1;32m 201[0m [0;31m# JVM exception message.[0m[0;34m[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 202[0;31m [0;32mraise[0m [0mconverted[0m [0;32mfrom[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m 203[0m [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m 204[0m [0;32mraise[0m[0;34m[0m[0;34m[0m[0m
[0;31mParseException[0m:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'table'(line 1, pos 5)
== SQL ==
this table doesn't exist
-----^^^
Steps to Reproduce (for bugs)
- Create script which fails and configure it to run on 2 different Databricks Runtime versions: 11.3 LTS and 10.4 LTS.
- Run the
dbx deploycommand. - Execute the workflow using the UI.
- Observe how the error message is displayed.
Context
I've noticed that there is a problem when showing the error message in Databricks Runtime 11.3 LTS. In order to verify this here is an example setup:
Parts of the deployment file:
custom:
cluster-11-3: &cluster-11-3
new_cluster:
spark_version: "11.3.x-scala2.12"
num_workers: 1
node_type_id: "i3.xlarge"
aws_attributes:
...[REDACTED]...
cluster-10-4: &cluster-10-4
new_cluster:
spark_version: "10.4.x-scala2.12"
num_workers: 1
node_type_id: "i3.xlarge"
aws_attributes:
...[REDACTED]...
build:
no_build: true
environments:
default:
workflows:
- name: "run-python-task"
tasks:
- task_key: "run-11-3"
<<: *cluster-11-3
spark_python_task:
python_file: "file://cicd_sample_project/main.py"
parameters: []
- task_key: "run-10-4"
<<: *cluster-10-4
spark_python_task:
python_file: "file://cicd_sample_project/main.py"
parameters: []
Content of the cicd_sample_project/main.py file:
from pyspark.sql import SparkSession
def entrypoint():
spark = SparkSession.builder.getOrCreate()
spark.table("this table doesn't exist")
if __name__ == "__main__":
entrypoint()
setup.py file:
"""
This file configures the Python package with entrypoints used for future runs on Databricks.
Please follow the `entry_points` documentation for more details on how to configure the entrypoint:
* https://setuptools.pypa.io/en/latest/userguide/entry_point.html
"""
from setuptools import find_packages, setup
from cicd_sample_project import __version__
PACKAGE_REQUIREMENTS = ["pyyaml"]
# packages for local development and unit testing
# please note that these packages are already available in DBR, there is no need to install them on DBR.
LOCAL_REQUIREMENTS = [
"pyspark==3.2.1",
"delta-spark==1.1.0",
]
TEST_REQUIREMENTS = [
# development & testing tools
"dbx>=0.8,<0.9"
]
setup(
name="cicd_sample_project",
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["setuptools","wheel"],
install_requires=PACKAGE_REQUIREMENTS,
extras_require={"local": LOCAL_REQUIREMENTS, "test": TEST_REQUIREMENTS},
entry_points = {
"console_scripts": [
]},
version=__version__,
description="",
author="",
)
Your Environment
-
dbx version used:
cicd-sample-project git:(main) ✗ dbx --version [dbx][2022-11-17 13:35:01.971] 🧱 Databricks eXtensions aka dbx, version ~> 0.8.7 -
Databricks Runtime version:
- 11.3 LTS
- 10.4 LTS
-
python venv:
(.venv) ➜ cicd-sample-project git:(main) ✗ pip freeze aiohttp==3.8.3 aiosignal==1.3.1 arrow==1.2.3 async-timeout==4.0.2 attrs==22.1.0 binaryornot==0.4.4 certifi==2022.9.24 cffi==1.15.1 chardet==5.0.0 charset-normalizer==2.1.1 # Editable Git install with no remote (cicd-sample-project==0.0.1) -e REDACTED/cicd-sample-project click==8.1.3 cloudpickle==2.2.0 colorama==0.4.6 commonmark==0.9.1 cookiecutter==2.1.1 cryptography==38.0.3 databricks-cli==0.17.3 dbx==0.8.7 decorator==5.1.1 delta-spark==1.1.0 entrypoints==0.4 frozenlist==1.3.3 gitdb==4.0.9 GitPython==3.1.29 idna==3.4 importlib-metadata==5.0.0 Jinja2==3.1.2 jinja2-time==0.2.0 MarkupSafe==2.1.1 mlflow-skinny==2.0.0 multidict==6.0.2 oauthlib==3.2.2 packaging==21.3 pathspec==0.10.2 protobuf==4.21.9 py==1.11.0 py4j==0.10.9.3 pycparser==2.21 pydantic==1.10.2 Pygments==2.13.0 PyJWT==2.6.0 pyparsing==3.0.9 pyspark==3.2.1 python-dateutil==2.8.2 python-slugify==6.1.2 pytz==2022.6 PyYAML==6.0 requests==2.28.1 retry==0.9.2 rich==12.6.0 shellingham==1.5.0 six==1.16.0 smmap==5.0.0 sqlparse==0.4.3 tabulate==0.9.0 text-unidecode==1.3 typer==0.7.0 typing_extensions==4.4.0 urllib3==1.26.12 watchdog==2.1.9 yarl==1.8.1 zipp==3.10.0 -
Local OS info:
MacOS 12.6.1 (21G217)
hi @Squaess , thanks a lot for opening the issue. I'll try to repo and see what causes it.
Same thing happens for 11.0, 11.1 ML DBR