apache-spark-node icon indicating copy to clipboard operation
apache-spark-node copied to clipboard

Querying a cassandra DB via spark

Open Enilia opened this issue 9 years ago • 3 comments

Hey there,

As the title says, i am trying to query an existing cassandra DB from nodejs using your library. I am using a spark cluster on a LAN

Here's what i have done so far : using :

  • CentOS 7
  • node 4.4.4
  • [email protected]
  • spark 1.6.1
  • cassandra 2.2.5
  • spark-cassandra-connector 1.6.0-M1

From the root of my project :

ASSEMBLY_JAR=/usr/share/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar node_modules/apache-spark-node/bin/spark-node \
--master spark://192.168.1.101:7077 --conf spark.cores.max=4 \
--jars /root/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.6.0-M1-36-g220aa37.jar

Once i have access to the command line i tried to do

spark-node> sqlContext.sql("Select count(*) from mykeyspace.mytable")

but of course i get a

Error: Error creating class
org.apache.spark.sql.AnalysisException: Table not found: `mykeyspace`.`mytable`;

i then tried to adapt a snippet of scala i've seen on a stack overflow post

var df = sqlContext
  .read()
  .format("org.apache.spark.sql.cassandra")
  .option("table", "mytable")
  .option("keyspace", "mykeyspace")
  .load(null, function(err, res) { console.log(err); console.log(res) }) 

but all i get is a

Error: Error running instance method
java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org

The problem surely comes from the fact that i don't understand half of how everything is linked together, that's why i'm here asking for some help about this issue. All i need is a way to execute basic sql functions (with only WHERE clauses) over one cassandra table.

I recon this project seems no longer maintained, but this is as far as i can see the simpler solution i have seen so far (solutions like eclairJS have way more functionalities than i need, at the cost of an increased complexity and maybe less performance) and it would just fill my needs.

Enilia avatar May 11 '16 16:05 Enilia

You should post your complete code. According to the docs you need to set up the SparkContext with the right configuration properties.

Furthermore, there is an example on how to use SparkSQL.

Basically, this is not an issue of apache-spark-node and should be closed accordingly.

tobilg avatar May 11 '16 16:05 tobilg

Hi @Enilia - as @tobilg answers this doesn't appear to be an issue but if we're missing something please post a more complete description and I'll do my best to help. (This project is still maintained btw.)

henridf avatar May 12 '16 04:05 henridf

Hi and thanks for the quick reply,

I'm sorry if i thought the project was not maintained anymore, i got this impression from the low activity of the repo in the last few months :s . Anyway, i'm glad you're still active on this project. I'll get a look at the links @tobilg gave here and post a more complete issue if there's something missing. I'm still new in the cassandra/spark/java/scala universe, so i'm a bit lost here tbh ^^

Best regards, Eni

Enilia avatar May 12 '16 09:05 Enilia