spark-postgres
spark-postgres copied to clipboard
PostgreSQL and GreenPlum Data Source for Apache Spark
PostgreSQL & GreenPlum Data Source for Apache Spark


A library for reading data from and transferring data to Greenplum databases with Apache Spark, for Spark SQL and DataFrames.
This library is 100x faster than Apache Spark's JDBC DataSource while transferring data from Spark to Greenpum databases.
Also, this library is fully transactional .
Try it now !
CTAS
CREATE TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
AS
SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;
View & Insert
CREATE TEMPORARY TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
INSERT INTO TABLE tbl SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;
Please refer to Spark SQL Guide - JDBC To Other Databases to learn more about the similar usage.