spark-examples icon indicating copy to clipboard operation
spark-examples copied to clipboard

The scala project under examples would cause NoClassDefFoundError

Open cfangplus opened this issue 6 years ago • 11 comments

I followed the steps that is illustrated in the page of https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/building-sample-apps/scala.md and built the scala project. Then I used spark-submit to submit the application to the cluster and I got an exception called 'java.lang.NoClassDefFoundError: scala/Product$class'. It seems that the jar produced by the mvn command does not contain scala library. Please see the attached file for detail. NoClassDefFoundError

cfangplus avatar Dec 13 '19 13:12 cfangplus

I note the assembly-no-scala.xml file under the project and find the scala library is excluded, but why?

cfangplus avatar Dec 13 '19 13:12 cfangplus

Hi cfangplus,

What Spark version were you using? The examples here are not compatible with the latest Spark-3.0.0-preview yet. If you run into the same issue on Spark 2.x, please provide your environment details and more logs so I could try reproducing this issue.

As for the Scala library, I think it's already provided by Spark runtime.

chuanlihao avatar Dec 17 '19 02:12 chuanlihao

yea, I noticed that. I used spark-submit to submit the application to a spark3.0 cluster and the scala library conflicted, do you know how to fix or how to enable this scala project compatible wih spark3.0?

cfangplus avatar Dec 17 '19 06:12 cfangplus

Spark 3.0 support is currently in development. My suggestion is using Spark 2.x before the 3.0 compatible release.

There is no easy way to fix the compatible issue. Both the xgboost project and the examples project must be updated and re-built. It's complex.

chuanlihao avatar Dec 17 '19 06:12 chuanlihao

I know this project is developed since June 2019 and at that time the cuda version is 10.1. Now I have a gpu environment with cuda10.2, how could I get the cudf-0.9.2-cuda10-2.jar and libxgboost4j.so with cuda10.2 support?

cfangplus avatar Dec 18 '19 08:12 cfangplus

The team is planning to support CUDA 10.2.

As for now, you could install both CUDA 10.1 & 10.2 on your server and run these examples with CUDA 10.1: https://stackoverflow.com/questions/41330798/install-multiple-versions-of-cuda-and-cudnn

chuanlihao avatar Dec 18 '19 09:12 chuanlihao

That's great. Now I have another question. Why does this spark-example project be proposed? As we konw that, NV Rapids + Dask could provide distributed data processing, machine learning and graph computing, so Apache Spark seems does not been needed, right?

cfangplus avatar Dec 19 '19 11:12 cfangplus

I think Joshua answered your question here: https://github.com/rapidsai/cudf/issues/3643

Also as specified by the README.md, this repo provides docs and example applications that demonstrate the RAPIDS.ai GPU-accelerated XGBoost-Spark project.

chuanlihao avatar Dec 20 '19 12:12 chuanlihao

@chuanlihao Thank you for your reply. Recently I runed this program with cuda10.1 and the result is good. As we know, the kernel module of xgboost is writen by c/c++ and provide shared library to python, JVM, R and other languages API. So does cuML has the capacity that could be used by Spark via JNI ? Do you have some similar idea or plan?

cfangplus avatar Dec 31 '19 09:12 cfangplus

Currently cuML only have Python binding. Technically we could apply a similar approach as in XGboost with cuML. Please let us know your use cases for cuML on Spark.

anfeng avatar Jan 03 '20 17:01 anfeng

@anfeng Thanks,I mean we want to accelerate our Spark ML/Graph applications with gpu.

cfangplus avatar Jan 10 '20 03:01 cfangplus