snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-826851: Improve DataFrameReader and DataFrameWriter API

Open sfc-gh-mrojas opened this issue 2 years ago • 2 comments

What is the current behavior?

Currently the API for DataFrameReader and DataFrameWriter has several differences with the Spark DataFrame APIs. Also it is a little inconsistent as DataFrameReader has option()/options() and writer does not.

What is the desired behavior?

Provide an API that is more familiar with typical Spark reader/writer code,

How would this improve snowflake-snowpark-python?

that will make it easier for people transitioning or looking to move to snowpark. Also it might enable a more natural API for supporting additional formats

References, Other Background

sfc-gh-mrojas avatar May 25 '23 01:05 sfc-gh-mrojas

@sfc-gh-mrojas could you please provide concrete examples?

sfc-gh-mkeller avatar May 25 '23 22:05 sfc-gh-mkeller

Sure common DataFrame Writer patterns:

# Write CSV file with column header (column names)
df.write.option("header",True) \
 .csv("/tmp/spark_output/zipcodes")

# Other CSV options
df2.write.options(header='True', delimiter=',') \
.csv("/tmp/spark_output/zipcodes")
# Saving modes
df2.write.mode('overwrite').csv("/tmp/spark_output/zipcodes")
# You can also use this
df2.write.format("csv").mode('overwrite').save("/tmp/spark_output/zipcodes")
# Read and create a temporary view
# Infer schema (note that for larger files you 
# may want to specify the schema)
df = (spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .load(csv_file))
df.createOrReplaceTempView("us_delay_flights_tbl")
# In Python
# Path to our US flight delays CSV file 
csv_file = "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv"
# Schema as defined in the preceding example
schema="date STRING, delay INT, distance INT, origin STRING, destination STRING"
flights_df = spark.read.csv(csv_file, schema=schema)

sfc-gh-mrojas avatar May 25 '23 23:05 sfc-gh-mrojas