Hannes Miller issues

Results 12 issues of


                                            Hannes Miller

JdbcSource to partition queries for potential performance improvements

- Spark can partition data on a **JDBC** data frame by by specifying the following binding parameters which are all **longs**: **lowerBound**, **upperBound**, **numPartitions** and and **partition key** column -...

An overwrite option for all sink types that write to HDFS

An overwrite option for all sink types that write to HDFS **Proposal** Sink.withOverwrite **AffectedSinks** - Parquet - AvroParquet - Avro - Orc - Csv

enhancement

Reasons to use EEL for Data Ingestion at REST over Spark

For documentation??? Spark is great at parallel processing data already in a distributed store like **HDFS** but it's not really designed for ingesting data at **REST** from a non-distributed store...

documentation

priority

An EEL sink for Flume

The experimental _kite data set sink_ exist for _Flume 1.6.0_ which looks on the face of it has the capability of ingesting directly into Hive tables. - See the following...

enhancement

help wanted

EEL DSL for a CLI shell

EEL DSL for a CLI shell - A Scala DSL for EEL commands. - The Scala REPL to be used as an interactive shell - Scala variables, loops and conditional...

enhancement

help wanted

Orc properties for Hive Source and Sink

I just want to confirm that we respect the following Hive properties outlined in the documentation: - Hive: http://orc.apache.org/docs/hive-config.html My only concern is that our Orc dialect for Hive Source...

enhancement

help wanted

Support Distributed writes with EEL

Support Distributed writes with EEL - N writers via JdbcSource -> KafkaSink - N Writers via HiveSink/KuduSink/HBaseSink - Now what if the **HiveSink** and others that use a **LinkedBlockingQueue** to...

enhancement

Need an example of creating DDL for a Hive Parquet table with EEL

- CSVSource to HiveSink ```scala val schema = AvroSchemaFns.fromAvroSchema(new Schema.Parser().parse(new File("user.avsc"))) CsvSource(path) .withSchema(schema) .to(HiveSink("mydatabase", "myTable")) ``` - Table field: **fname**, **lname**, **age**, **salary** - 2 partition keys of **country** and...

documentation

EEL Sink for Apache Ignite

I have already written a PoC one based on EEL 1.1.x - let's review Note we should base this on the latest Apache Ignite 2.x.

enhancement

Enhanced error handling for HiveWriter

An enhancement request if possible... When an exception is thrown from the underlying format writer (Parquet, Orc) ... it would be nice if we could trap the exception higher up...

enhancement

help wanted