spark-avro icon indicating copy to clipboard operation
spark-avro copied to clipboard

Avro data from DStream

Open lemanuel opened this issue 9 years ago • 3 comments

Is there a possibility to make a dataframe from generic avro records that are in a DStream? In the tests I have seen something like writing each rdd to a temp file and then read it back with spark-avro, but I do not want to add another step into the process.

lemanuel avatar Dec 12 '16 09:12 lemanuel

You can do this with DStream.foreachRDD { rdd => df = rdd.toDF ... } using the code in https://github.com/databricks/spark-avro/pull/216.

cbyn avatar Feb 09 '17 18:02 cbyn

Could you please confirm if the below approach is right? I am not able to create a DF after pulling your code.

DStream.foreachRDD { rdd => df = rddToDataFrame(rdd) }

ananth3010 avatar Jun 08 '17 07:06 ananth3010

I implemented it as an implicit on RDD[GenericRecord]. If you import RddUtils.RddToDataFrame then you can call toDF on the RDD as I posted above.

cbyn avatar Jun 08 '17 14:06 cbyn