Querying a cassandra DB via spark
Hey there,
As the title says, i am trying to query an existing cassandra DB from nodejs using your library. I am using a spark cluster on a LAN
Here's what i have done so far : using :
- CentOS 7
- node 4.4.4
- [email protected]
- spark 1.6.1
- cassandra 2.2.5
- spark-cassandra-connector 1.6.0-M1
From the root of my project :
ASSEMBLY_JAR=/usr/share/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar node_modules/apache-spark-node/bin/spark-node \
--master spark://192.168.1.101:7077 --conf spark.cores.max=4 \
--jars /root/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.6.0-M1-36-g220aa37.jar
Once i have access to the command line i tried to do
spark-node> sqlContext.sql("Select count(*) from mykeyspace.mytable")
but of course i get a
Error: Error creating class
org.apache.spark.sql.AnalysisException: Table not found: `mykeyspace`.`mytable`;
i then tried to adapt a snippet of scala i've seen on a stack overflow post
var df = sqlContext
.read()
.format("org.apache.spark.sql.cassandra")
.option("table", "mytable")
.option("keyspace", "mykeyspace")
.load(null, function(err, res) { console.log(err); console.log(res) })
but all i get is a
Error: Error running instance method
java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org
The problem surely comes from the fact that i don't understand half of how everything is linked together, that's why i'm here asking for some help about this issue. All i need is a way to execute basic sql functions (with only WHERE clauses) over one cassandra table.
I recon this project seems no longer maintained, but this is as far as i can see the simpler solution i have seen so far (solutions like eclairJS have way more functionalities than i need, at the cost of an increased complexity and maybe less performance) and it would just fill my needs.
You should post your complete code. According to the docs you need to set up the SparkContext with the right configuration properties.
Furthermore, there is an example on how to use SparkSQL.
Basically, this is not an issue of apache-spark-node and should be closed accordingly.
Hi @Enilia - as @tobilg answers this doesn't appear to be an issue but if we're missing something please post a more complete description and I'll do my best to help. (This project is still maintained btw.)
Hi and thanks for the quick reply,
I'm sorry if i thought the project was not maintained anymore, i got this impression from the low activity of the repo in the last few months :s . Anyway, i'm glad you're still active on this project. I'll get a look at the links @tobilg gave here and post a more complete issue if there's something missing. I'm still new in the cassandra/spark/java/scala universe, so i'm a bit lost here tbh ^^
Best regards, Eni