Writing Data to Elasticsearch Storage Engine

Open Kefaun2601 opened this issue 4 years ago • 1 comments

Task Description

This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the Factory Pattern outlined in Issue 218 where we abstract out storage engine-specific implementation.

To achieve the final goal of being able to write Sparkler data into the Elasticsearch storage engine, the team envisions that we'll be following these steps:

Make sure the Elasticsearch storage engine is set up appropriately and ready to accept data
Write simple data to Elasticsearch a. Perhaps a simple visualization to prove functionality
Reorganize Sparkler data into a format conducive to Elasticsearch indexing
Write data into Elasticsearch
Visualize data in Elasticsearch (this will likely be brought up in a future issue)

This is a WIP and updates will be posted here as we make progress.

Mar 24 '21 16:03 Kefaun2601

@thammegowda @buggtb @lewismc We had a few questions about Crawler.scala while adding Elasticsearch:

How is the deep crawl different from a "normal" crawl? We only run deep crawl when -dc flag is enabled, but we always run normal crawl?
What does the FairFetcher class do? Do we need to know since FairFetcher is not specific to Solr?
Why is "storageProxy.commitCrawlDb()" called before the crawl, after the deep crawl, and after the normal crawl again?

Mar 26 '21 19:03 slhsxcmy