Avro data from DStream
Is there a possibility to make a dataframe from generic avro records that are in a DStream? In the tests I have seen something like writing each rdd to a temp file and then read it back with spark-avro, but I do not want to add another step into the process.
You can do this with DStream.foreachRDD { rdd => df = rdd.toDF ... } using the code in https://github.com/databricks/spark-avro/pull/216.
Could you please confirm if the below approach is right? I am not able to create a DF after pulling your code.
DStream.foreachRDD { rdd => df = rddToDataFrame(rdd) }
I implemented it as an implicit on RDD[GenericRecord]. If you import RddUtils.RddToDataFrame then you can call toDF on the RDD as I posted above.