sedona icon indicating copy to clipboard operation
sedona copied to clipboard

Geostats Functions in Spark Connect

Open james-willis opened this issue 6 months ago • 3 comments

I don't think the stats functions are compatible with spark connect today. I tried this in spark 3.5:

(python) ➜  python git:(graphframes-0.9.0) ✗ export SPARK_REMOTE=local
(python) ➜  python git:(graphframes-0.9.0) ✗ pytest -v tests/stats

and every test that wasn't skipped (for checkpointing) gave this kind of _jvm error:

self = <pyspark.sql.connect.session.SparkSession object at 0x16fd17df0>, name = '_jvm'

    def __getattr__(self, name: str) -> Any:
        if name in ["_jsc", "_jconf", "_jvm", "_jsparkSession"]:
>           raise PySparkAttributeError(
                error_class="JVM_ATTRIBUTE_NOT_SUPPORTED", message_parameters={"attr_name": name}
E               pyspark.errors.exceptions.base.PySparkAttributeError: [JVM_ATTRIBUTE_NOT_SUPPORTED] Attribute `_jvm` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session.

../../../../.local/share/virtualenvs/python-GYLC1Bm8/lib/python3.10/site-packages/pyspark/sql/connect/session.py:692: PySparkAttributeError

james-willis avatar Jul 15 '25 21:07 james-willis

Hi @james-willis I had like to tackle this issue to make Geostats functions compatible with Spark Connect. I will focus on refactoring the existing _jvm calls to use Spark Connect-compatible APIs. My local master is updated, and I am working on the feature/spark-connect-geostats-2103 branch.

Please let me know if there's any specific guidance you have.

Subham-KRLX avatar Jul 17 '25 03:07 Subham-KRLX

I don’t have experience making these kind of changes. I think maybe the ST function Python methods implement something like this.

Part of me wants to deprecate these and point folks to the sql functions instead. Those already work in Spark Connect. I know that might be controversial.

james-willis avatar Jul 17 '25 07:07 james-willis

Thanks for the valuable hint James! I am already diving into the ST function implementations to understand their approach. I have also noted your thoughts on deprecation and will consider that as I explore the best path for Spark Connect compatibility. Will share updates soon.

Subham-KRLX avatar Jul 17 '25 09:07 Subham-KRLX