eel-sdk icon indicating copy to clipboard operation
eel-sdk copied to clipboard

EEL DSL for a CLI shell

Open hannesmiller opened this issue 9 years ago • 2 comments

EEL DSL for a CLI shell

  • A Scala DSL for EEL commands.
  • The Scala REPL to be used as an interactive shell
  • Scala variables, loops and conditional statements can be combined with DSL commands to easily script tasks
  • A bootstrap shell script called eel-shell similar to spark-shell - will automatically import the eel DSL packages
  • Also automatically import OS packages to allow for OS commands to be run and combined with the DSL

Import

Import data with options from an EEL source to a sink

import from jdbc with driver=blah,url=blah,sql=blah to hive with db=blah,table=blah

Import from jdbc with driver=blah,url=blah,sql=blah to Parquet with path=blah

Export

Export data with options to an EEL sink from a source

export to hive with db=blah,table=blah from jdbc with driver=blah,url=blah,sql=blah

export to hive with db=blah,table=blah from Parquet with path=blah

  • Note allow a transform sub-command at the appropriate place to do custom transformations on the underlying rows in the frame.

More commands to follow....

hannesmiller avatar Jan 12 '17 09:01 hannesmiller

DDL

Display the DDL from create table command for JDBC query:

ddl from JDBC with driver=blah, url=jdbc:blah, sql=blah, dialect=Parquet, location=blah, sql=blah partitions=p1:string,p2:int

  • partitions are specified using the notation name:type

hannesmiller avatar Jan 12 '17 19:01 hannesmiller

File Compaction

Reduce several small files into a single file for a folder or hive table.

Note for Parquet or other similar types the new file's schema should be the union of all schemas.

In addition you may have to add columns to the HiveMetasore.

  • compact Parquet path
  • compact Orc path
  • compact Hive dbName table

hannesmiller avatar Jan 17 '17 09:01 hannesmiller