databricks-vscode icon indicating copy to clipboard operation
databricks-vscode copied to clipboard

Running pytest with local spark session

Open rotemb-cye opened this issue 1 year ago • 3 comments

Hey,

I am trying to run pytest on my local PC, when databricks extension is installed. I am trying to create local spark session:


def get_spark_session():
    spark = (
        SparkSession.builder.master("local[*]")
        .appName("local-tests")
        .config("spark.driver.bindAddress", "127.0.0.1")
        .getOrCreate()
    )
    return spark


@pytest.mark.etl
@pytest.fixture(scope="session")
def spark_session():
    spark = get_spark_session()
    yield spark
    spark.stop()

and I get the following error: RuntimeError: Only remote Spark sessions using Databricks Connect are supported. Could not find connection parameters to start a Spark remote session.

How to solve it? I want to be able to run my pytest when being offline

TNXXX

rotemb-cye avatar Mar 21 '24 10:03 rotemb-cye

Hi, this is the same exact issue that we have been struggling with. It seems that installing databricks-connect modifies installed pyspark package and adds throwing this error to the code. I'm also interested in finding a workaround for this because in the current state it basically blocks using Databricks Connect.

htuomola avatar May 03 '24 06:05 htuomola

Hello I managed to get my local spark session working by the following VSCode command palette

image

In fact, even uninstalling the extension was not working.

benoitLebreton-perso avatar May 13 '24 15:05 benoitLebreton-perso

thanks for your solution @benoitLebreton-perso! Do you know how to fix the issue when you run pytest from a command line?

odimko avatar Aug 27 '24 18:08 odimko

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

bestekov avatar Oct 28 '24 16:10 bestekov

@benoitLebreton-perso , did you manage to have two versions of pyspark installed? Or did you go the route of uninstalling databricks-connect?

The big issue seems to be that installing databricks connect uninstalls the rest of the full pyspark which is extremely annoying. Would be much better to sideload and patch commands only when commands are invoked in a databricks-connect context.

I uninstalled databricks-connect. I work in local environnement with a local pyspark session. I work with databricks spark session only on notebooks now and I sync my local code with my repos

benoitLebreton-perso avatar Oct 29 '24 10:10 benoitLebreton-perso

Closing this issue as stale, sorry for not replying in time.

If you are still having similar issues, please checkout this https://github.com/databricks/databricks-vscode/issues/1540, there's context on how to run pytests in portable way (so they can be executed remotely or locally)

ilia-db avatar Mar 04 '25 09:03 ilia-db