databricks-vscode icon indicating copy to clipboard operation
databricks-vscode copied to clipboard

[Feature] Add support for cluster execution of arbitrary notebook code

Open lukeSmth opened this issue 2 years ago • 1 comments

Loving the extension! Huge improvement for using engineering best practices and integrating Databricks compute with the larger ecosystem of locally executed tools.

I'd like to see support for executing arbitrary notebook code (not just Spark calls) on remote Databricks clusters. This would allow local developers to seamlessly take advantage of Databricks compute for heavy, non-Spark workflows (model training for example).

Two approaches come to mind:

  1. Pipe commands to the Command Execution API, possibly using a local Jupyter Kernel to interop between the notebook environment and Databricks.
  2. Connect to the driver node Jupyter Kernel over SSH

Command Execution API

The Databricks Power Tools extension solves this by using the Command Execution API.

I don't know Rust, but as far as I can tell this article Connecting Jupyter with Databricks aims to wrap the API with a local Jupyter kernel (which would allow connections to any Jupyter client).

SSH

This seems the most straightforward in terms of net new code required. Also seems identical to the deprecated (for security purposes?) jupyterlab-integration.

lukeSmth avatar Dec 14 '23 17:12 lukeSmth

+1 on this

@kartikgupta-db - If you have a rough understanding of what would need to change for this to be implemented and would accept a PR, I'd be willing to have a go. Just need some guidance on getting started

MrTeale avatar May 02 '24 22:05 MrTeale

Closing this in favor of https://github.com/databricks/databricks-vscode/issues/472

ilia-db avatar Mar 04 '25 09:03 ilia-db