DistributedR icon indicating copy to clipboard operation
DistributedR copied to clipboard

Integration with accumulo

Open madhvi-gupta opened this issue 10 years ago • 3 comments

How accumulo can be made a data source for distributedR so that analytics can be done over that data parallely?

madhvi-gupta avatar Aug 05 '15 05:08 madhvi-gupta

Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).

We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.

fun-indra avatar Aug 05 '15 20:08 fun-indra

On Thursday 06 August 2015 02:02 AM, IndraR wrote:

Hi Madhvi, The issue with running distributedR with accumulo is that you need a connector to read data from accumulo to R. We have neither created nor tested any data loaders for accumulo. You are welcome to search for other open source R-accumulo connectors. A quick search shows the the following https://github.com/DataTacticsCorp/raccumulo (though I have no idea whether it works or not).

We will soon release a HDFS connector. It will help you load data directly from HDFS and run distributedR applications.

— Reply to this email directly or view it on GitHub https://github.com/vertica/DistributedR/issues/32#issuecomment-128139367.

Hi Indra,

I am currently trying to use raccumulo(github link you shared) for loading data in distributedR but it's not working as required.It is not providing the whole data to be loaded in R.

Thanks and Regards Madhvi Gupta

madhvi-gupta avatar Aug 06 '15 04:08 madhvi-gupta

As I mentioned, we have not tried or tested any accumulo connectors. Still, are you able to load data in a single R session (not distributedR) using that connector? What is the code that you used with distributedR? What is the error? How much data is getting loaded?

fun-indra avatar Aug 07 '15 23:08 fun-indra