learning-spark icon indicating copy to clipboard operation
learning-spark copied to clipboard

Task not serializable

Open Make42 opened this issue 9 years ago • 1 comments

Using the Jupyter notebook with Apache Toree as kernel

import play.api.libs.json._

case class Person(name: String, lovesPandas: Boolean)
implicit val personFormat = Json.format[Person]

val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""

val jsonParse = Json.parse(text)
val result = Json.fromJson[Person](jsonParse)
result.get

is working while

import org.apache.spark._
import play.api.libs.json._
import play.api.libs.functional.syntax._

case class Person(name: String, lovesPandas: Boolean)
implicit val personReads = Json.format[Person]

val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""

val input = sc.parallelize(List(text))
val parsed = input.map(Json.parse(_))
val result = parsed.flatMap(record => {    
    personReads.reads(record).asOpt
})
result.filter(_.lovesPandas).map(Json.toJson(_)).saveAsTextFile("files/out/pandainfo.json")

returns with

Name: org.apache.spark.SparkException
Message: Task not serializable
StackTrace: org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
[...]

Seems to be there is a general problem...?

Make42 avatar May 19 '16 16:05 Make42

you could Ensure serialization of the entire class implementing serializable interface for the class(Java reference) a serializable trait could be used for scala .

tahervali avatar Nov 21 '19 11:11 tahervali