Hannes Miller
Hannes Miller
- Spark can partition data on a **JDBC** data frame by by specifying the following binding parameters which are all **longs**: **lowerBound**, **upperBound**, **numPartitions** and and **partition key** column -...
An overwrite option for all sink types that write to HDFS **Proposal** Sink.withOverwrite **AffectedSinks** - Parquet - AvroParquet - Avro - Orc - Csv
For documentation??? Spark is great at parallel processing data already in a distributed store like **HDFS** but it's not really designed for ingesting data at **REST** from a non-distributed store...
The experimental _kite data set sink_ exist for _Flume 1.6.0_ which looks on the face of it has the capability of ingesting directly into Hive tables. - See the following...
EEL DSL for a CLI shell - A Scala DSL for EEL commands. - The Scala REPL to be used as an interactive shell - Scala variables, loops and conditional...
I just want to confirm that we respect the following Hive properties outlined in the documentation: - Hive: http://orc.apache.org/docs/hive-config.html My only concern is that our Orc dialect for Hive Source...
Support Distributed writes with EEL - N writers via JdbcSource -> KafkaSink - N Writers via HiveSink/KuduSink/HBaseSink - Now what if the **HiveSink** and others that use a **LinkedBlockingQueue** to...
- CSVSource to HiveSink ```scala val schema = AvroSchemaFns.fromAvroSchema(new Schema.Parser().parse(new File("user.avsc"))) CsvSource(path) .withSchema(schema) .to(HiveSink("mydatabase", "myTable")) ``` - Table field: **fname**, **lname**, **age**, **salary** - 2 partition keys of **country** and...
I have already written a PoC one based on EEL 1.1.x - let's review Note we should base this on the latest Apache Ignite 2.x.
An enhancement request if possible... When an exception is thrown from the underlying format writer (Parquet, Orc) ... it would be nice if we could trap the exception higher up...