sparkucx
sparkucx copied to clipboard
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
suppose to use map-side partition number, but mistakenly use reduce-side partition number. Also, I believe this fix will help this issue https://github.com/openucx/sparkucx/issues/30
# Configuration * Operating system: Ubuntu 16.04.6 LTS * Kernel: 4.4.0-135-generic * UCX: UCX Release v1.9.0 configured with `./contrib/configure-release --with-java` * Java: Oracle JDK 11.0.8 * Spark: Apache Spark 3.0.1...
We are working on spark 3.1.2 with java version 11 , we don't wan tto degrade our cluster can somebody tell me if it will also work on java 11...
Didn't get worker address for BlockManagerId(2, xxx, xxx, None) during 3600
For reference. This branch seems like worked for GPU unified API.
Working implementation of Spark Shuffle manager with UcxShuffleTransport API. TODO: 1. Run tests 2. Publish to maven 3. Port one-sided API protocol
To run a benchmark: ``` mvn package (there would be 2 jars: ucx-spark-2.0-for-spark-3.0.0-jar-with-dependencies.jar and ucx-spark-2.0-for-spark-3.0.0-tests.jar ``` Put them to some jars folder with cudf. Server: ``` java -cp /PATH_TO_UCX/lib/:spark/jars/*:jars/* org.apache.spark.shuffle.ucx.perf.UcxShuffleTransportPerfTool...
There is only TCP support at my machine , and there will be [1597129379.385163] [server:20110:1] select.c:434 UCX ERROR no remote registered memory access transport to client:6868: tcp/bond0 - no put...