module not found: graphframes#graphframes;0.5.0-spark2.1-s_2.11
Hi,
I am trying to follow the instructions under https://spark.rstudio.com/graphframes/ for running graphframes with spark version 2.1.0.
However, I am facing a similar issue as has been described before in https://github.com/rstudio/graphframes/issues/7.
That means after:
sparklyr::spark_install(version = "2.1.0")
I can connect to spark in a fresh R session via:
library(sparklyr)
sc <- spark_connect(master = "local", version = "2.1.0")
However, when also loading graphframes, I would run into the following error:
> library(sparklyr)
> library(graphframes)
> sc <- spark_connect(master = "local", version = "2.1.0", config = conf)
Ivy Default Cache set to: /Users/ludwig/.ivy2/cache
The jars for the packages stored in: /Users/ludwig/.ivy2/jars
:: loading settings :: url = jar:file:/Users/ludwig/spark/spark-2.1.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 1226ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
module not found: graphframes#graphframes;0.5.0-spark2.1-s_2.11
==== local-m2-cache: tried
file:/Users/ludwig/.m2/repository/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom
-- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:
file:/Users/ludwig/.m2/repository/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar
==== local-ivy-cache: tried
/Users/ludwig/.ivy2/local/graphframes/graphframes/0.5.0-spark2.1-s_2.11/ivys/ivy.xml
-- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:
/Users/ludwig/.ivy2/local/graphframes/graphframes/0.5.0-spark2.1-s_2.11/jars/graphframes.jar
==== central: tried
https://repo1.maven.org/maven2/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom
-- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:
https://repo1.maven.org/maven2/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar
==== spark-packages: tried
http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.pom
-- artifact graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar:
http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: graphframes#graphframes;0.5.0-spark2.1-s_2.11: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: graphframes#graphframes;0.5.0-spark2.1-s_2.11: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Error in spark_connect_gateway(gatewayAddress, gatewayPort, sessionId, :
Gateway in localhost:8880 did not respond.
I have tried using a more recent version of spark (2.4.3) as well as putting the apparently missing graphframes jars directly into the jars directory without success.
Any advice on how to resolve this would be greatly appreciated. Thanks!
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.7 graphframes_0.1.2 sparklyr_1.7.0
loaded via a namespace (and not attached):
[1] pillar_1.6.2 compiler_4.1.0 BiocManager_1.30.16
[4] dbplyr_2.1.1 prettyunits_1.1.1 remotes_2.4.0
[7] r2d3_0.2.5 base64enc_0.1-3 tools_4.1.0
[10] pkgbuild_1.2.0 digest_0.6.27 jsonlite_1.7.2
[13] lifecycle_1.0.0 tibble_3.1.3 pkgconfig_2.0.3
[16] rlang_0.4.11 cli_3.0.1 DBI_1.1.1
[19] rstudioapi_0.13 curl_4.3.2 yaml_2.2.1
[22] parallel_4.1.0 withr_2.4.2 httr_1.4.2
[25] generics_0.1.0 vctrs_0.3.8 htmlwidgets_1.5.3
[28] askpass_1.1 rappdirs_0.3.3 rprojroot_2.0.2
[31] tidyselect_1.1.1 glue_1.4.2 forge_0.2.0
[34] R6_2.5.0 processx_3.5.2 fansi_0.5.0
[37] callr_3.7.0 purrr_0.3.4 tidyr_1.1.3
[40] magrittr_2.0.1 ps_1.6.0 ellipsis_0.3.2
[43] htmltools_0.5.1.1 assertthat_0.2.1 config_0.3.1
[46] utf8_1.2.2 openssl_1.4.4 crayon_1.4.1
The problem seems to be that the default repos (https://repo1.maven.org and http://dl.bintray.com), that sparklyr tries to install graphframes from, do not host the graphframes jars anymore.
This can be fixed that by adding "https://repos.spark-packages.org" to the list of repositories as done here.
Also the code can be updated to pull the latest version of graphframes (v0.8.1, Sep 2020), which works with Spark version 2.4 and higher, as done here.
I can provide a pull request if it seems worth incorporating these updates.