snowpark-python SNOW-826851: Improve DataFrameReader and DataFrameWriter API

What is the current behavior?

Currently the API for DataFrameReader and DataFrameWriter has several differences with the Spark DataFrame APIs. Also it is a little inconsistent as DataFrameReader has option()/options() and writer does not.

What is the desired behavior?

Provide an API that is more familiar with typical Spark reader/writer code,

How would this improve `snowflake-snowpark-python`?

that will make it easier for people transitioning or looking to move to snowpark. Also it might enable a more natural API for supporting additional formats

References, Other Background

May 25 '23 01:05 sfc-gh-mrojas

@sfc-gh-mrojas could you please provide concrete examples?

May 25 '23 22:05 sfc-gh-mkeller

Sure common DataFrame Writer patterns:

# Write CSV file with column header (column names)
df.write.option("header",True) \
 .csv("/tmp/spark_output/zipcodes")


# Other CSV options
df2.write.options(header='True', delimiter=',') \
.csv("/tmp/spark_output/zipcodes")

# Saving modes
df2.write.mode('overwrite').csv("/tmp/spark_output/zipcodes")
# You can also use this
df2.write.format("csv").mode('overwrite').save("/tmp/spark_output/zipcodes")

# Read and create a temporary view
# Infer schema (note that for larger files you 
# may want to specify the schema)
df = (spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .load(csv_file))
df.createOrReplaceTempView("us_delay_flights_tbl")

# In Python
# Path to our US flight delays CSV file 
csv_file = "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv"
# Schema as defined in the preceding example
schema="date STRING, delay INT, distance INT, origin STRING, destination STRING"
flights_df = spark.read.csv(csv_file, schema=schema)

May 25 '23 23:05 sfc-gh-mrojas

SNOW-826851: Improve DataFrameReader and DataFrameWriter API

What is the current behavior?

What is the desired behavior?

How would this improve snowflake-snowpark-python?

References, Other Background

How would this improve `snowflake-snowpark-python`?