bijection-avro sometimes deserializes objects to GenericData.Record instead of the requested type
I defined an Avro schema and used SBT Avrohugger to generate the Scala code. Serialization and deserialization so far works on my local machine. I am doing something like this:
val x: Array[Byte] = ... // Get the serialized data
val myThing = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x)
When I run this locally, it works perfectly. I now created a Spark task that can be submitted to Spark with the help of the SBT Assembly plugin. When I "submit" this task locally (using spark-submit --master local[*]), this serialization works. However, when I submit it to a "real" Spark installation, I get a CCE:
Exception in thread "main" java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.example.avro.MyAvroThing
So, the deserializer does not recognize the format and deserializes it to a generic Avro type. I double checked that all necessary Avro libraries and Twitter's Bijection-Avro are correctly embedded in my resulting JAR.
As a next investigation step, I analyzed the GenericData.Record I get by doing:
val mystery = SpecificAvroCodecs.toBinary[MyAvroThing](MyAvroThing.SCHEMA$).invert(x).asInstanceOf[Try[Any]]
mystery.get match {
case _: MyAvroThing => println("ok!")
case r: GenericData.Record => println("Got a generic record with schema: " + r.getSchema.getFields.map(_.name()).mkString(", "))
case _ => println("Got something completely different")
}
When I run this locally, it prints out ok! as it correctly gets the MyAvroThing. When I run this on the Spark cluster, I get:
Got a generic record with schema: foo, bar, quux
this means, my schema IS honored by the deserializer and it is deserialized correctly, only the transformation to the resulting class is not done somehow.
When I query the record's fields by name, I get the correct data I expect in my MyAvroThing.
What is going wrong here?
I wonder if the issue could be a classpath issue. Locally you have one version of avro, on the cluster you have another and it shows up as a runtime error.