mleap Schema for converting a PySpark Dataframe into a LeapFrame

I'm creating a schema from a Pyspark dataframe that can be used to build a corresponding Leapframe. My question is, how do I handle non-scalar types i.e. arrays ?

My schema looks like this.

df.printSchema()
root
 |-- scalar_1: string (nullable = true)
 |-- scalar_2: double (nullable = true)
 |-- vector_1: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- vector_2: array (nullable = true)
 |    |-- element: date (containsNull = true)

If the Pyspark dataframe consisted of purely scalar columns, I would do the following:

schema = [ { "name" : field.simpleString().split(":")[0], \
                      "type" : field.simpleString().split(":")[1] }
for field in df.schema ]

Any suggestions are welcome.

Dec 28 '18 11:12 femibyte

Hey @femibyte, we have these converters for e.g. https://github.com/combust/mleap/blob/e177e5816250ff0dfae7f82e84317f288a107f31/mleap-spark-base/src/main/scala/ml/combust/mleap/spark/SparkSupport.scala#L63 that would do this for you. This is currently in Scala, but I'll update the issue once I've gotten it to work in Python, you shouldn't have to do this conversion manually.

Mar 06 '19 11:03 ancasarb

Hey @femibyte, we have these converters for e.g.

https://github.com/combust/mleap/blob/e177e5816250ff0dfae7f82e84317f288a107f31/mleap-spark-base/src/main/scala/ml/combust/mleap/spark/SparkSupport.scala#L63

that would do this for you. This is currently in Scala, but I'll update the issue once I've gotten it to work in Python, you shouldn't have to do this conversion manually.

Is this updated for python?

Aug 31 '20 22:08 bhrigs