Uses SparkSession/Dataframe over SparkContext/RDD when writing model metadata

Open qziyuan opened this issue 6 months ago • 1 comments

With https://issues.apache.org/jira/browse/SPARK-48909, Spark ML is now using SparkSession and Dataframe API to write model metadata for example: spark.createDataFrame(Seq(Tuple1(metadataJson))).write.text(metadataPath)

However, SynapseML still relies on SparkContext and the RDD API in places such as this line. This prevents SynapseML from functioning in environments where RDDs are no longer supported—for example, Databricks clusters with Unity Catalog enabled.

Would it be possible for SynapseML to adopt the changes introduced in SPARK-48909 to ensure compatibility with such environments?

Jul 24 '25 18:07 qziyuan

Hi @qziyuan , yes we would gladly accept any PRs to update this saving and loading, provided it doesent break serialization of old style models.

Aug 06 '25 20:08 mhamilton723