hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-4892] Fix hudi-spark3-bundle

Open yihua opened this issue 3 years ago • 2 comments

Change Logs

This PR fixes the hudi-spark3-bundle. Before this PR, reading a Hudi table with Spark datasource in Spark 3.3 shell with hudi-spark3-bundle throws the following exception. Some classes are not packaged into the spark3 bundle.

scala> val df = spark.read.format("hudi").load("<table_path>")
java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.hudi.Spark32PlusDefaultSource not found
  at java.util.ServiceLoader.fail(ServiceLoader.java:239)
  at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
  at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
  at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
  at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
  at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
  at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
  at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
  at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
  at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
  at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:657)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
  ... 47 elided 

Impact

Risk level: low

Fixing the hudi-spark3-bundle packaging only to avoid class not found.

Tested locally and on EMR that the hudi-spark3-bundle works after the fix.

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

yihua avatar Sep 21 '22 23:09 yihua

@yihua : can you check CI failure?

nsivabalan avatar Sep 22 '22 22:09 nsivabalan

@nsivabalan CI passes after retires. It was flaky. Merging this fix. Screen Shot 2022-09-23 at 14 19 38

yihua avatar Sep 23 '22 21:09 yihua