eel-sdk
eel-sdk copied to clipboard
Need an example of creating DDL for a Hive Parquet table with EEL
- CSVSource to HiveSink
val schema = AvroSchemaFns.fromAvroSchema(new Schema.Parser().parse(new File("user.avsc")))
CsvSource(path)
.withSchema(schema)
.to(HiveSink("mydatabase", "myTable"))
- Table field: fname, lname, age, salary
- 2 partition keys of country and city
object EelCreateTableExample extends App {
val crateTableCommand = HiveDDL.showDDL(
tableName = "mydatabase.mytable",
partitions = Seq(
PartitionColumn("country", StringType),
PartitionColumn("city", StringType)
),
fields = Seq(
Field("fname", StringType),
Field("lname", StringType),
Field("age", IntType.Signed),
Field("salary", DecimalType(38, 5))
),
tableType = TableType.EXTERNAL_TABLE,
location = Some("hdfs://nameservice1/blah/mytable_location"),
serde = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
inputFormat = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
outputFormat = "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
props = Map.empty,
tableComment = Some("my lovely table"),
ifNotExists = true
)
println(crateTableCommand)
}
- Ouput:
CREATE EXTERNAL TABLE IF NOT EXISTS `mydatabase.mytable` (
`fname` string,
`lname` string,
`age` int,
`salary` decimal(38,5))
PARTITIONED BY (
`country` string,
`city` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/blah/mytable_location'