pytd
pytd copied to clipboard
Spark + Java 11 requires additional config param for Apache Arrow
According to the latest Spark documentation:
For Java 11, -Dio.netty.tryReflectionSetAccessible=true is required additionally for Apache Arrow library. This prevents java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available when Apache Arrow uses Netty internally.
We'd need to tweak pytd/spark.py to handle this case.
Like this...
SparkConf()
.setMaster("local[*]")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.sql.execution.arrow.pyspark.enabled", "true")
.set("spark.executor.extraJavaOptions", "-Dio.netty.tryReflectionSetAccessible=true")
.set("spark.driver.extraJavaOptions", "-Dio.netty.tryReflectionSetAccessible=true")